Template Contents
The following is the directory structure for this template:
- Data This contains the copy of the input data.
- R This contains the R code to simulate the input datasets, pre-process them, create the analytical datasets, train the models, and score the data.
- Python This contains the Python code to simulate the input datasets, pre-process them, create the analytical datasets, train the models, and score the data.
- Resources This directory contains other resources for the solution package.
- SQLR This contains the T-SQL code with R to pre-process the datasets, train the models, identify the champion model and provide recommendations. It also contains a PowerShell script to automate the entire process, including loading the data into the database (not included in the T-SQL code).
- SQLPy This contains the T-SQL code with Python to pre-process the datasets, train the models, identify the champion model and provide recommendations. It also contains a PowerShell script to automate the entire process, including loading the data into the database (not included in the T-SQL code).
File | Description |
.\Data\LengthOfStay.csv | Synthetic data modeled after real world hospital inpatient records |
Model Development in R
File | Description |
Hospital_Length_Of_Stay_Notebook.ipynb | Contains the Jupyter Notebook file that runs all the .R scripts. |
SQL_connection.R | Contains details of connection to SQL Server used in all other scripts. |
step1_data_preprocessing.R | Data loaded and missing values handled |
step2_feature_engineering.R | Measures standardized |
step3_training_evaluation.R | Trains and Scores regression Random Forest (rxDForest) and a gradient boosted trees model (rxFastTrees) |
Model Development in Python
File | Description |
Hospital_Length_Of_Stay_Notebook.ipynb | Contains the Jupyter Notebook file that runs all the .R scripts. |
SQL_connection.py | Contains details of connection to SQL Server used in all other scripts. |
step1_data_preprocessing.py | Data loaded and missing values handled |
step2_feature_engineering.py | Measures standardized |
step3_training_evaluation.py | Trains and Scores regression Random Forest (rxDForest) and a gradient boosted trees model (rxFastTrees) |
Operationalize in SQL R
File | Description |
.\SQLR\Length_Of_Stay.ps1 | Automates execution of all .sql files and creates stored procedures |
.\SQLR\execute_yourself.sql | used in Length_Of_Stay.sql |
.\SQLR\load_data.ps1 | used in Length_Of_Stay.sql |
.\SQLR\step0_create_table.sql | Creates initial LengthOfStay table |
.\SQLR\step1_data_processing.sql | Handles missing data |
.\SQLR\step2_feature_engineering.sql | Standardizes measures and creates number_of_issues and lengthofstay_bucket |
.\SQLR\step3a_splitting.sql | Splits data into train and test |
.\SQLR\step3b_training.sql | Trains and scores a gradient boosted trees model (rxFastTrees) or Random Forest (rxDForest) |
.\SQLR\step3c_testing_evaluating.sql | Scores and evaluates regression RF |
Operationalize in SQL Python
File | Description |
.\SQLPy\Length_Of_Stay.ps1 | Automates execution of all .sql files and creates stored procedures |
.\SQLPy\execute_yourself.sql | used in Length_Of_Stay.sql |
.\SQLPy\load_data.ps1 | used in Length_Of_Stay.sql |
.\SQLPy\step0_create_table.sql | Creates initial LengthOfStay table |
.\SQLPy\step1_data_processing.sql | Handles missing data |
.\SQLPy\step2_feature_engineering.sql | Standardizes measures and creates number_of_issues and lengthofstay_bucket |
.\SQLPy\step3a_splitting.sql | Splits data into train and test |
.\SQLPy\step3b_training.sql | Trains and scores a gradient boosted trees model (rx_btrees) or Random Forest (rx_dforest) |
.\SQLPy\step3c_testing_evaluating.sql | Scores and evaluates models |
Resources for the Solution Package
File | Description |
.\Resources\create_user.sql | Used during initial SQL Server setup to create the user and password and grant permissions. |
.\Resources\Data_Dictionary.xlsx | Description of all variables in the LengthOfStay.csv data file |
.\Resources\Images\ | Directory of images used for the Readme.md in this package. |
< Home