Add Random Forest Model and Binary Length of Stay Task#1029
Open
jordynhayden wants to merge 1 commit intosunlabuiuc:masterfrom
Open
Add Random Forest Model and Binary Length of Stay Task#1029jordynhayden wants to merge 1 commit intosunlabuiuc:masterfrom
jordynhayden wants to merge 1 commit intosunlabuiuc:masterfrom
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Contributor
Name: Jordyn Hayden
Net ID: jhayden3
Email: jhayden3@illinois.edu
Type of Contribution
Model + Task (Option 2 + Option 3)
Original Paper
Park, Y.; and Ho, J. C. 2020. CaliForest: Calibrated Ran-
dom Forest for Health Data. Proc ACM Conf Health
Inference Learn (2020), 2020: 40–50.
(https://pmc.ncbi.nlm.nih.gov/articles/PMC8299436/)
Description
Adds support for a RandomForest model and a length of stay (LOS) threshold binary prediction task to PyHealth. This contribution is motivated by the research paper (linked and referenced above) “CaliForest: Calibrated Random Forest for Health Data” study. The authors in this paper evaluate a calibrated random forest model (CaliForest) that uses out-of-bag samples against a baseline uncalibrated random forest model on various binary prediction tasks including LOS > 3 and LOS > 7 days, but PyHealth currently lacks support for the baseline random forest model and the corresponding binary prediction task for LOS. This PR adds a pyhealth RandomForest model which is a wrapper around sklearn’s random forest models that integrates well with standard PyHealth pipelines, along with a LOS threshold prediction task.
An example study using the MIMIC-III dataset demonstrates a pipeline usage of the random forest model and LOS threshold task, including hyperparameter tuning (which includes fluctuating parameters beyond those of which the paper analyzed) and evaluation on the LOS > 3 days task. Validation results during hyper parameter tuning are largely consistent with the given paper (validation results showed AUROC metric at approximately 0.70–0.77 vs. the paper's reported 0.73 for a baseline random forest), but the pyhealth example showed a lower test AUROC metric score (approximately 0.60) which may be do to overfitting on a smaller dataset.
File Guide
examples/length_of_stay/mimic3_length_of_stay_random_forest.py- Example pipeline usage of the random forest model and LOS > 3 binary prediction task including hyper parameter tuningpyhealth/models/random_forest.py- Random Forest Model Implementationpyhealth/models/utils.py- Updated to include dataloader to numpy matrices utility classpyhealth/tasks/length_of_stay_prediction.py- Updated to include a length of stay binary prediction task to predict whether a patient's LOS exceed X number of days. This task handles minor patient exclusion that other LOS tasks in the file noted as TODO.tests/core/test_data_loader_to_numpy_util.py- Tests for the dataloader to numpy matrices utility class added to models/utils.pytests/core/test_mimic3_threshold_los.py- Tests for the length of stay threshold binary prediction tasktests/core/test_random_forest.py- Tests for the random forest model