Predicting Hospital Length of Stay

Implemented with Microsoft Machine Learning Services

Typical Workflow for On-Premises Deployment


This solution enables a predictive model for Length of Stay for in-hospital admissions. Length of Stay (LOS) is defined in number of days from the initial admit date to the date that the patient is discharged from any given hospital facility.

Advanced LOS prediction at the time of admission can greatly enhance the quality of care as well as operational workload efficiency and help with accurate planning for discharges resulting in lowering of various other quality measures such as readmissions.

To demonstrate a typical workflow, we’ll introduce you to a few personas. You can follow along by performing the same steps for each persona.

NOTE: If you’re just interested in the outcomes of this process we have also created a fully automated solution that simulates the data, trains and scores the models by executing PowerShell scripts. This is the fastest way to deploy the solution on your machine. See PowerShell Instructions for this deployment.

If you want to follow along and have not run the PowerShell script, you will need to first create a database table in your SQL Server. You will then need to replace the connection_string at the top of each R file with your database and login information.

Step 1: Server Setup and Configuration with Danny the DB Analyst


Let me introduce you to Danny, the Database Analyst. Danny is the main contact for anything regarding the SQL Server database that stores all the patient data at our hospitals.

Danny was responsible for installing and configuring the SQL Server. He has added a user with all the necessary permissions to execute R and Python scripts on the server and modify the Hospital_R and Hospital_Py databases.

You can see an example of creating a user in the Hospital/Resources/exampleuser.sql query.

You can perform these steps in your environment by using the instructions in START HERE.

Step 2: Data Prep and Modeling with Debra the Data Scientist (Code from R IDE)


Now let’s meet Debra, the Data Scientist. Debra’s will make use of past admission data to create model(s) that will predict LOS. Debra might use R or Python for this task; we’ll show you examples of both. She uses SQL Server 2017 Machine Learning Services which supports both R and Python for in-database analytics.


Debra would work on her own machine, using R Client to execute these R scripts. She will need to install and configure an R IDE to use with R Client.

Now that Debra’s environment is set up, she opens her IDE and creates a Project. To follow along with her, open the Hospital/R directory. There you will see three files with the name Predicting Hospital Length of Stay.

  • If you are using Visual Studio, double click on the “Visual Studio SLN” file.
  • If you are using RStudio, double click on the “R Project” file.

  1. First she’ll develop scripts to prepare the data. To view the scripts she writes, open the files mentioned below. R Scripts: If you are using Visual Studio, you will see these file in the Solution Explorer tab on the right. In RStudio, the files can be found in the Files tab, also on the right.

    • step1_data_preprocessing.R
    • step2_feature_engineering.R

    You can run these scripts if you wish, but you may also skip them if you want to get right to the modeling. The data that these scripts create already exists in the SQL database.

    In both Visual Studio and RStudio, there are multiple ways to execute the code from the R Script window. The fastest way for both IDEs is to use Ctrl-Enter on a single line or a selection. Learn more about R Tools for Visual Studio or RStudio.

  2. After running the step1 and step2 scripts, Debra goes to SQL Server Management Studio to log in and view the results of these steps by running the following query:

    
      SELECT TOP 1000 *    
      FROM [Hospital_R].[dbo].[LoS]
    
  3. Now she is ready for training the models. She creates and executes the following script to train and score a regression Random Forest (rxDForest) and a gradient boosted trees model (rxFastTrees) on the training set. This uses the new MicrosoftML package for Microsoft R Server (version 9.0.1 or higher). Both models will predict LOS. When she looks at the metrics of both models, she notices that along with a faster performance time, the rxFastTrees model also performs with lower error, so she decides to use this model for prediction.

    • step3_training_evaluation.R

  4. Debra will now use PowerBI to visualize the predictions created from her model. She creates the PowerBI Dashboard which you can find in the Hospital directory. If you want to refresh data in your PowerBI Dashboard, make sure to follow these instructions to provide the necessary information.


  5. A summary of this process and all the files involved is described in more detail here.

Debra would work on her own machine, using Machine Learning Services with Python to execute these Python scripts. In case you want to run the code from the VM, ML Services Python has already been installed.

The Python code is present in the Hospital/Python directory.

OPTIONAL: You can execute the Python code on your local computer if you wish, but you must first prepare both the VM and your computer.

Follow these instructions to view and execute the Python code with the Jupyter Notebook on the VM . You can also execute the Python code with an IDE. Both PyCharms and Visual Studio are installed on your VM. For each, you must first configure the Python interpreter to use C:\Program Files\Microsoft\ML Server\PYTHON_SERVER\python.exe.
  1. First she’ll develop scripts to prepare the data. To view the scripts she writes, open the files mentioned below, or see the Data Prepration and Feature Engineering sections in the Jupyter Notebook.

    • step1_data_preprocessing.py
    • step2_feature_engineering.py

    You can run these scripts if you wish, but you may also skip them if you want to get right to the modeling. The data that these scripts create already exists in the SQL database.

  2. After running the step1 and step2 scripts, Debra goes to SQL Server Management Studio to log in and view the results of these steps by running the following query:

    
      SELECT TOP 1000 *    
      FROM [Hospital_Py].[dbo].[LoS]
        
  3. Now she is ready for training the models. She creates and executes the following script to train and score a regression Random Forest (rxDForest) and a gradient boosted trees model (rxFastTrees) on the training set. This uses the new MicrosoftML package for Python. Both models will predict LOS. When she looks at the metrics of both models, she notices that along with a faster performance time, the rxFastTrees model also performs with lower error, so she decides to use this model for prediction.

    • step3_training_evaluation.py

  4. Debra will now use PowerBI to visualize the predictions created from her model. She creates the PowerBI Dashboard which you can find in the Hospital_Py directory. If you want to refresh data in your PowerBI Dashboard, make sure to follow these instructions to provide the necessary information.

  5. A summary of this process and all the files involved is described in more detail here.

Step 3: Operationalize with Debra and Danny


Debra has completed her tasks. She has connected to the SQL database, executed code that pushed (in part) execution to the SQL machine. She has scored data, created LOS predictions, and also created a summary dashboard which she will hand off to Caroline and Chris - see below.

These models will be used daily on new data as new patients are admitted. Instead of going back to Debra each time, Danny can operationalize the code in TSQL files which he can then be scheduled to run daily.

Debra hands over her scripts to Danny who adds the code to the database as stored procedures, using both SQL queries and embedded R code (in the Hospital_R database) or embedded Python code (in the Hospital_Py database).

Danny also creates a production pipeline, which uploads the daily data and then cleans it, performs feature engineering, and scores and saves predictions into a new table.

You can explore these stored procedures by logging into SSMS and opening the Programmability>Stored Procedures section of the Hospital_R or Hospital_Py database.

You can find this script in the SQLR or SQLPY directory, and execute it yourself by following the PowerShell Instructions. As noted earlier, this is the fastest way to execute all the code included in this solution. (This will re-create the same set of tables and models as the above R/Python scripts.)

Step 4: Deploy and Visualize with Caroline and Chris


Now that the models are in place and the dashboard is built, we will meet our last two personas - Caroline the CMIO and Chris the Care Line Manager.

Caroline the CMIO

Caroline will use the predictions to determine if resources are being allocated appropriately in her hospital network. Her dashboard will help her to not only make determinations about what facilities are being overtaxed, but also what resources at those facilities may need to be bolstered. For example, using the dashboard provided in this solution, Caroline is able to determine which facilities will not be discharging patients at the rate that they are coming in. Using this knowledge, she can then make recommendations to others to transfer and or re-route incoming patients to facilities that are experiencing less burden.

Additionally, Caroline will make recommendations on re-routing specific resources and personnel given demands. By using length of stay predictions, she is able to see which disease conditions are most prevalent in patients that will be staying in care facilities long term. For example, in seeing that heart failure patients are being predicted to spend a longer amounts of time in a specific facility, she will recommend additional heart failure resources be diverted to that facility.

Chris the Care Line Manager

Chris is directly involved with the care of patients. His role requires monitoring individual patient statuses as well as ensuring that staff is avilable to meet their patients’ specific care requirements. Additionally, Chris plans for the discharge of patients; determining if the patient will be discharged during a low staff time (such as weekends).

Length of stay prediction allows Chris to better plan for his patients’ care. In the provided dashboard, Chris is able to see the number of patients under his care by selecting his facility at the top of the page. He can then see all the patients in that facility today and a predicted number of days each has left until their discharge. This allows him to allocated appropriate resources be available for his patient population. He can also see when patients might be projected to leave on a Saturday or Sunday and can either ensure discharge occurs, or to plan for additional days of care in the inpatient setting. Additionally, the vitals and condition breakdowns allow the care line manager to closely monitor the status of those patients projected to be in the hospital for longer periods of time in order to ensure additional complications do not arise during their stay.

Remember that before the data in this dashboard can be refreshed to use your scored data, you must configure the dashboard as Debra did in step 2 of this workflow.

## Step 5: Use the Model during Admission ----------------------------------------------------------------

The predicted LOS might also be displayed during a patient’s admission. An example of such a display is available in a sample webpage included in this solution.

To try out this example site, you must first start the lightweight webserver for the site. Open a terminal window or powershell window and type the following command.

cd C:\Solutions\Hospital\Website
npm start

You should see the following response:

The website is running at http://localhost:3000
Tedious-Connection-Pool: filling pool with 2
Tedious-Connection-Pool: creating connection: 1
...

Now leave this window open and open the url http://localhost:3000 in your browser.

This site is set up to mimic a hospital dashboard. Click on one of the first two patients to view their details. Select the Admit Patient button to trigger the LOS prediction. The predicted length of stay will appear below the button.

You can view the model values by opening the Console window on your browser.

For Edge or Internet Explorer: Press F12 to open Developer Tools, then click on the Console tab. For FireFox or Chome: Press Ctrl-Shift-i to open Developer Tools, then click on the Console tab.

See more details about this example see For the Web Developer.