Predicting Hospital Length of Stay

Implemented with Microsoft Machine Learning Services

Template Contents


The following is the directory structure for this template:

  • Data This contains the copy of the input data.
  • R This contains the R code to simulate the input datasets, pre-process them, create the analytical datasets, train the models, and score the data.
  • Python This contains the Python code to simulate the input datasets, pre-process them, create the analytical datasets, train the models, and score the data.
  • Resources This directory contains other resources for the solution package.
  • SQLR This contains the T-SQL code with R to pre-process the datasets, train the models, identify the champion model and provide recommendations. It also contains a PowerShell script to automate the entire process, including loading the data into the database (not included in the T-SQL code).
  • SQLPy This contains the T-SQL code with Python to pre-process the datasets, train the models, identify the champion model and provide recommendations. It also contains a PowerShell script to automate the entire process, including loading the data into the database (not included in the T-SQL code).

Copy of Input Datasets


File Description
.\Data\LengthOfStay.csv Synthetic data modeled after real world hospital inpatient records

Model Development in R


File Description
Hospital_Length_Of_Stay_Notebook.ipynb Contains the Jupyter Notebook file that runs all the .R scripts.
SQL_connection.R Contains details of connection to SQL Server used in all other scripts.
step1_data_preprocessing.R Data loaded and missing values handled
step2_feature_engineering.R Measures standardized
step3_training_evaluation.R Trains and Scores regression Random Forest (rxDForest) and a gradient boosted trees model (rxFastTrees)

Model Development in Python


File Description
Hospital_Length_Of_Stay_Notebook.ipynb Contains the Jupyter Notebook file that runs all the .R scripts.
SQL_connection.py Contains details of connection to SQL Server used in all other scripts.
step1_data_preprocessing.py Data loaded and missing values handled
step2_feature_engineering.py Measures standardized
step3_training_evaluation.py Trains and Scores regression Random Forest (rxDForest) and a gradient boosted trees model (rxFastTrees)

Operationalize in SQL R


File Description
.\SQLR\Length_Of_Stay.ps1 Automates execution of all .sql files and creates stored procedures
.\SQLR\execute_yourself.sql used in Length_Of_Stay.sql
.\SQLR\load_data.ps1 used in Length_Of_Stay.sql
.\SQLR\step0_create_table.sql Creates initial LengthOfStay table
.\SQLR\step1_data_processing.sql Handles missing data
.\SQLR\step2_feature_engineering.sql Standardizes measures and creates number_of_issues and lengthofstay_bucket
.\SQLR\step3a_splitting.sql Splits data into train and test
.\SQLR\step3b_training.sql Trains and scores a gradient boosted trees model (rxFastTrees) or Random Forest (rxDForest)
.\SQLR\step3c_testing_evaluating.sql Scores and evaluates regression RF

Operationalize in SQL Python


File Description
.\SQLPy\Length_Of_Stay.ps1 Automates execution of all .sql files and creates stored procedures
.\SQLPy\execute_yourself.sql used in Length_Of_Stay.sql
.\SQLPy\load_data.ps1 used in Length_Of_Stay.sql
.\SQLPy\step0_create_table.sql Creates initial LengthOfStay table
.\SQLPy\step1_data_processing.sql Handles missing data
.\SQLPy\step2_feature_engineering.sql Standardizes measures and creates number_of_issues and lengthofstay_bucket
.\SQLPy\step3a_splitting.sql Splits data into train and test
.\SQLPy\step3b_training.sql Trains and scores a gradient boosted trees model (rx_btrees) or Random Forest (rx_dforest)
.\SQLPy\step3c_testing_evaluating.sql Scores and evaluates models

Resources for the Solution Package


File Description
.\Resources\create_user.sql Used during initial SQL Server setup to create the user and password and grant permissions.
.\Resources\Data_Dictionary.xlsx Description of all variables in the LengthOfStay.csv data file
.\Resources\Images\ Directory of images used for the Readme.md in this package.

< Home