Add Random Forest Model and Binary Length of Stay Task by jordynhayden · Pull Request #1029 · sunlabuiuc/PyHealth

jordynhayden · 2026-04-19T22:51:58Z

Contributor

Name: Jordyn Hayden
Net ID: jhayden3
Email: jhayden3@illinois.edu

Type of Contribution

Model + Task (Option 2 + Option 3)

Original Paper

Park, Y.; and Ho, J. C. 2020. CaliForest: Calibrated Ran-
dom Forest for Health Data. Proc ACM Conf Health
Inference Learn (2020), 2020: 40–50.

(https://pmc.ncbi.nlm.nih.gov/articles/PMC8299436/)

Description

Adds support for a RandomForest model and a length of stay (LOS) threshold binary prediction task to PyHealth. This contribution is motivated by the research paper (linked and referenced above) “CaliForest: Calibrated Random Forest for Health Data” study. The authors in this paper evaluate a calibrated random forest model (CaliForest) that uses out-of-bag samples against a baseline uncalibrated random forest model on various binary prediction tasks including LOS > 3 and LOS > 7 days, but PyHealth currently lacks support for the baseline random forest model and the corresponding binary prediction task for LOS. This PR adds a pyhealth RandomForest model which is a wrapper around sklearn’s random forest models that integrates well with standard PyHealth pipelines, along with a LOS threshold prediction task.

An example study using the MIMIC-III dataset demonstrates a pipeline usage of the random forest model and LOS threshold task, including hyperparameter tuning (which includes fluctuating parameters beyond those of which the paper analyzed) and evaluation on the LOS > 3 days task. Validation results during hyper parameter tuning are largely consistent with the given paper (validation results showed AUROC metric at approximately 0.70–0.77 vs. the paper's reported 0.73 for a baseline random forest), but the pyhealth example showed a lower test AUROC metric score (approximately 0.60) which may be do to overfitting on a smaller dataset.

File Guide

examples/length_of_stay/mimic3_length_of_stay_random_forest.py - Example pipeline usage of the random forest model and LOS > 3 binary prediction task including hyper parameter tuning
pyhealth/models/random_forest.py - Random Forest Model Implementation
pyhealth/models/utils.py - Updated to include dataloader to numpy matrices utility class
pyhealth/tasks/length_of_stay_prediction.py - Updated to include a length of stay binary prediction task to predict whether a patient's LOS exceed X number of days. This task handles minor patient exclusion that other LOS tasks in the file noted as TODO.
tests/core/test_data_loader_to_numpy_util.py - Tests for the dataloader to numpy matrices utility class added to models/utils.py
tests/core/test_mimic3_threshold_los.py - Tests for the length of stay threshold binary prediction task
tests/core/test_random_forest.py - Tests for the random forest model

…ject

Add Random Forest Model and LOS Task Contribution for CS598 Final Pro…

1037758

…ject

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Random Forest Model and Binary Length of Stay Task#1029

Add Random Forest Model and Binary Length of Stay Task#1029
jordynhayden wants to merge 1 commit intosunlabuiuc:masterfrom
jordynhayden:model-random-forest

jordynhayden commented Apr 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

jordynhayden commented Apr 19, 2026

Contributor

Type of Contribution

Original Paper

Description

File Guide

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant