Predicting Hospital Length of Stay

Implemented with SQL Server R Services

This solution enables a predictive model for Length of Stay for in-hospital admissions. Length of Stay (LOS) is defined in number of days from the initial admit date to the date that the patient is discharged from any given hospital facility.

Advanced LOS prediction at the time of admission can greatly enhance the quality of care as well as operational workload efficiency and help with accurate planning for discharges resulting in lowering of various other quality measures such as readmissions.

This guide assumes you have deployed the Predicting Hospital Length of Stay solution from the Cortana Intelligence Gallery.

If you are using your own SQL Server for this solution, use this guide instead.

To demonstrate a typical workflow, we’ll introduce you to a few personas. You can follow along by performing the same steps for each persona. While each persona would be working on a different computer, for simplicity, your Virtual Machine (VM) has all the tools each persona would use on the same machine.

Step 1: Server Setup and Configuration with Danny the DB Analyst

Let me introduce you to Danny, the Database Analyst. Danny is the main contact for anything regarding the SQL Server database that stores all the patient data at our hospitals.

Danny was responsible for installing and configuring the SQL Server. He has added a user with all the necessary permissions to execute R scripts on the server and modify the Hospital database. You can see an example of creating a user in the Hospital/Resources/exampleuser.sql query.

This has already been done on your deployed Cortana Intelligence Gallery VM.

Step 2: Data Prep and Modeling with Debra the Data Scientist (Code from R IDE)

Now let’s meet Debra, the Data Scientist. Debra’s will make use of past admission data to create model(s) that will predict LOS. Debra’s preferred language for developing the models is using R and SQL. She  uses Microsoft R Services with SQL Server 2016 as it provides the capability to run large datasets and also is not constrained by memory restrictions of Open Source R. 

Debra would work on her own machine, using R Client to execute these R scripts. In case you want to run the code from the VM, R Client has already been installed.

Debra uses an IDE to run R. On your VM, R Studio is installed. R Tools for Visual Studio is also installed, but you will have to either log in or create a new account to use this tool.

OPTIONAL: You can execute the R code on your local computer if you wish, but you must first prepare both the VM and your computer. Or you can view and execute the R code in a Jupyter Notebook on the VM.

Now that Debra’s environment is set up, she opens her IDE and creates a Project. To follow along with her, open the Hospital/R directory on the VM desktop, (or wherever you put this folder on your local machine). There you will see three files with the name Predicting Hospital Length of Stay.

  • If you are using Visual Studio, double click on the “Visual Studio SLN” file.
  • If you are using RStudio, double click on the “R Project” file.
  1. First she’ll develop R scripts to prepare the data. To view the scripts she writes, open the files mentioned below. If you are using Visual Studio, you will see these file in the Solution Explorer tab on the right. In RStudio, the files can be found in the Files tab, also on the right.

    • step1_data_preprocessing.R
    • step2_feature_engineering.R

    You can run these scripts if you wish, but you may also skip them if you want to get right to the modeling. The data that these scripts create already exists in the SQL database.

    In both Visual Studio and RStudio, there are multiple ways to execute the code from the R Script window. The fastest way for both IDEs is to use Ctrl-Enter on a single line or a selection. Learn more about R Tools for Visual Studio or RStudio.

  2. After running the step1 and step2 scripts, Debra goes to SQL Server Management Studio to log in and view the results of these steps by running the following query:

    SELECT TOP 1000 *    FROM [Hospital].[dbo].[LoS]
  3. Now she is ready for training the models. She creates and executes the following script to train and score a regression Random Forest (rxDForest) and a gradient boosted trees model (rxFastTrees) on the training set. This uses the new MicrosoftML package for Microsoft R Server (version 9.0.1). Both models will predict LOS. When she looks at the metrics of both models, she notices that along with a faster performance time, the rxFastTrees model also performs with lower error, so she decides to use this model for prediction.

    • step3_training_evaluation
  4. Debra will now use PowerBI to visualize the predictions created from her model. She creates the PowerBI Dashboard which you can find in the Hospital directory. If you want to refresh data in your PowerBI Dashboard, make sure to follow these instructions to provide the necessary information.

  5. A summary of this process and all the files involved is described in more detail here.

Step 3: Operationalize with Debra and Danny

Debra has completed her tasks. She has connected to the SQL database, executed code from her R IDE that pushed (in part) execution to the SQL machine. She has scored data, created LOS predictions, and also created a summary dashboard which she will hand off to Caroline and Chris - see below.

These models will be used daily on new data as new patients are admitted. Instead of going back to Debra each time, Danny can operationalize the code in TSQL files which he can then be scheduled to run daily.

Debra hands over her scripts to Danny who adds the code to the database as stored procedures, using embedded R code, or SQL queries.

Danny also creates a production pipeline, which uploads the daily data and then cleans it, performs feature engineering, and scores and saves predictions into a new table.

You can create the production pipeline using the following commands in a PowerShell window:

  1. Click on the windows key on your keyboard. Type the words PowerShell. Right click on Windows Powershell to and select Run as administrator to open the PowerShell window.

  2. In the Powershell command window, type the following command:

    Set-ExecutionPolicy Unrestricted -Scope Process

    Answer y to the prompt to allow the following scripts to execute.

  3. Now CD to the Hospital/SQLR directory and run one of the two following commands, inserting your server name (or “.” if you are on the same machine as the SQL server), database name, username, and password.

    • Run with no prompts:

        .\Length_Of_Stay.ps1 -ServerName "Server Name" -DBName "Database Name" -username "" -password "" -is_production "Y" -uninterrupted "Y"  
    • Run with prompts:

        .\Length_Of_Stay.ps1 -ServerName "Server Name" -DBName "Database Name" -username "" -password "" -is_production "Y" -uninterrupted "N"  
    • For example, uninterrupted mode for some user “rdemo” user with a password of “D@tascience”, the command would be:

        .\Length_Of_Stay.ps1 -ServerName "localhost" -DBName "Hospital" -username "rdemo" -password "D@tascience" -is_production "Y" -uninterrupted "Y"  

You can explore these stored procedures by logging into SSMS and opening the Programmability>Stored Procedures section of the Hospital database.

You can find this script in the SQLR directory, and execute it yourself by following the PowerShell Instructions. As noted earlier, this was already executed when your VM was first created.

Step 4: Deploy and Visualize with Caroline and Chris

Now that the models are in place and the dashboard is built, we will meet our last two personas - Caroline the CMIO and Chris the Care Line Manager.

Caroline the CMIO

Caroline will use the predictions to determine if resources are being allocated appropriately in her hospital network. Her dashboard will help her to not only make determinations about what facilities are being overtaxed, but also what resources at those facilities may need to be bolstered. For example, using the dashboard provided in this solution, Caroline is able to determine which facilities will not be discharging patients at the rate that they are coming in. Using this knowledge, she can then make recommendations to others to transfer and or re-route incoming patients to facilities that are experiencing less burden.

Additionally, Caroline will make recommendations on re-routing specific resources and personnel given demands. By using length of stay predictions, she is able to see which disease conditions are most prevalent in patients that will be staying in care facilities long term. For example, in seeing that heart failure patients are being predicted to spend a longer amounts of time in a specific facility, she will recommend additional heart failure resources be diverted to that facility.

Chris the Care Line Manager

Chris is directly involved with the care of patients. His role requires monitoring individual patient statuses as well as ensuring that staff is avilable to meet their patients’ specific care requirements. Additionally, Chris plans for the discharge of patients; determining if the patient will be discharged during a low staff time (such as weekends).

Length of stay prediction allows Chris to better plan for his patients’ care. In the provided dashboard, Chris is able to see the number of patients under his care by selecting his facility at the top of the page. He can then see all the patients in that facility today and a predicted number of days each has left until their discharge. This allows him to allocated appropriate resources be available for his patient population. He can also see when patients might be projected to leave on a Saturday or Sunday and can either ensure discharge occurs, or to plan for additional days of care in the inpatient setting. Additionally, the vitals and condition breakdowns allow the care line manager to closely monitor the status of those patients projected to be in the hospital for longer periods of time in order to ensure additional complications do not arise during their stay.

Remember that before the data in this dashboard can be refreshed to use your scored data, you must configure the dashboard as Debra did in step 2 of this workflow.