Skip to content

Add Random Forest Model and Binary Length of Stay Task#1029

Open
jordynhayden wants to merge 1 commit intosunlabuiuc:masterfrom
jordynhayden:model-random-forest
Open

Add Random Forest Model and Binary Length of Stay Task#1029
jordynhayden wants to merge 1 commit intosunlabuiuc:masterfrom
jordynhayden:model-random-forest

Conversation

@jordynhayden
Copy link
Copy Markdown

Contributor

Name: Jordyn Hayden
Net ID: jhayden3
Email: jhayden3@illinois.edu

Type of Contribution

Model + Task (Option 2 + Option 3)

Original Paper

Park, Y.; and Ho, J. C. 2020. CaliForest: Calibrated Ran-
dom Forest for Health Data. Proc ACM Conf Health
Inference Learn (2020), 2020: 40–50.

(https://pmc.ncbi.nlm.nih.gov/articles/PMC8299436/)

Description

Adds support for a RandomForest model and a length of stay (LOS) threshold binary prediction task to PyHealth. This contribution is motivated by the research paper (linked and referenced above) “CaliForest: Calibrated Random Forest for Health Data” study. The authors in this paper evaluate a calibrated random forest model (CaliForest) that uses out-of-bag samples against a baseline uncalibrated random forest model on various binary prediction tasks including LOS > 3 and LOS > 7 days, but PyHealth currently lacks support for the baseline random forest model and the corresponding binary prediction task for LOS. This PR adds a pyhealth RandomForest model which is a wrapper around sklearn’s random forest models that integrates well with standard PyHealth pipelines, along with a LOS threshold prediction task.

An example study using the MIMIC-III dataset demonstrates a pipeline usage of the random forest model and LOS threshold task, including hyperparameter tuning (which includes fluctuating parameters beyond those of which the paper analyzed) and evaluation on the LOS > 3 days task. Validation results during hyper parameter tuning are largely consistent with the given paper (validation results showed AUROC metric at approximately 0.70–0.77 vs. the paper's reported 0.73 for a baseline random forest), but the pyhealth example showed a lower test AUROC metric score (approximately 0.60) which may be do to overfitting on a smaller dataset.

File Guide

  • examples/length_of_stay/mimic3_length_of_stay_random_forest.py - Example pipeline usage of the random forest model and LOS > 3 binary prediction task including hyper parameter tuning
  • pyhealth/models/random_forest.py - Random Forest Model Implementation
  • pyhealth/models/utils.py - Updated to include dataloader to numpy matrices utility class
  • pyhealth/tasks/length_of_stay_prediction.py - Updated to include a length of stay binary prediction task to predict whether a patient's LOS exceed X number of days. This task handles minor patient exclusion that other LOS tasks in the file noted as TODO.
  • tests/core/test_data_loader_to_numpy_util.py - Tests for the dataloader to numpy matrices utility class added to models/utils.py
  • tests/core/test_mimic3_threshold_los.py - Tests for the length of stay threshold binary prediction task
  • tests/core/test_random_forest.py - Tests for the random forest model

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant