Predicting Hospital Length of Stay

Implemented with SQL Server R Services

Typical Workflow for On-Premises Deployment


This solution enables a predictive model for Length of Stay for in-hospital admissions. Length of Stay (LOS) is defined in number of days from the initial admit date to the date that the patient is discharged from any given hospital facility.

Advanced LOS prediction at the time of admission can greatly enhance the quality of care as well as operational workload efficiency and help with accurate planning for discharges resulting in lowering of various other quality measures such as readmissions.

This guide assumes you are using an on premises SQL Server for this solution.

If you have deployed the Predicting Hospital Length of Stay solution from the Cortana Intelligence Gallery you should instead use this guide.

To demonstrate a typical workflow, we’ll introduce you to a few personas. You can follow along by performing the same steps for each persona.

NOTE: If you’re just interested in the outcomes of this process we have also created a fully automated solution that simulates the data, trains and scores the models by executing PowerShell scripts. This is the fastest way to deploy the solution on your machine. See PowerShell Instructions for this deployment.

If you want to follow along and have not run the PowerShell script, you will need to first create a database table in your SQL Server. You will then need to replace the connection_string at the top of each R file with your database and login information.

Step 1: Server Setup and Configuration with Danny the DB Analyst


Let me introduce you to Danny, the Database Analyst. Danny is the main contact for anything regarding the SQL Server database that stores all the patient data at our hospitals.

Danny was responsible for installing and configuring the SQL Server. He has added a user named ‘rdemo’ with all the necessary permissions to execute R scripts on the server and modify the Hospital database. This was done through the create_user.sql file.

You can perform these steps in your environment by using the instructions in START HERE.

Step 2: Data Prep and Modeling with Debra the Data Scientist


Now let’s meet Debra, the Data Scientist. Debra’s will make use of past admission data to create model(s) that will predict LOS. Debra’s preferred language for developing the models is using R and SQL. She  uses Microsoft R Services with SQL Server 2016 as it provides the capability to run large datasets and also is not constrained by memory restrictions of Open Source R. 

Debra will work on her own machine, using R Client to execute these R scripts. She will need to install and configure an R IDE to use with R Client.

Now that Debra’s environment is set up, she opens her IDE and creates a Project. To follow along with her, open the Hospital/R directory. There you will see three files with the name Predicting Hospital Length of Stay.

  • If you use Visual Studio, double click on the Visual Studio SLN file.
  • If you use RStudio, double click on the “R Project” file.
  1. First she’ll develop R scripts to prepare the data. To view the scripts she writes, open the files mentioned below. If you are using Visual Studio, you will see these file in the Solution Explorer tab on the right. In RStudio, the files can be found in the Files tab, also on the right.

    • step1_data_preprocessing.R
    • step2_feature_engineering.R

    You can run these scripts if you wish, but you may also skip them if you want to get right to the modeling. The data that these scripts create already exists in the SQL database.

    In both Visual Studio and RStudio, there are multiple ways to execute the code from the R Script window. The fastest way for both IDEs is to use Ctrl-Enter on a single line or a selection. Learn more about R Tools for Visual Studio or RStudio.

  2. If you are following along, if you have modified any of the default values created by this solution package you will need to replace the connection string in the SQL_connection.R file with details of your login and database name.

    connection_string <- "Driver=SQL Server;Server=localhost;Database=Hospital;UID=rdemo;PWD=D@tascience"
    
    Make sure there are no spaces around the "=" in the connection string - it will not work correctly when spaces are present.

    If you are creating a new database by using these scripts, you must first create the database name in SSMS. Once it exists it can be referenced in the connection string. (Log into SSMS using the same username/password you supply in the connection string, or rdemo, D@tascience if you haven’t changed the default values.)

    This connection string contains all the information necessary to connect to the SQL Server from inside the R session. As you can see in the script, this information is then used in the RxInSqlServer() command to setup a sql string. The sql string is in turn used in the rxSetComputeContext() to execute code directly in-database. You can see this in the SQL_connection.R file:

    connection_string <- "Driver=SQL Server;Server=localhost;Database=Hospital;UID=rdemo;PWD=D@tascience"
    sql <- RxInSqlServer(connectionString = connection_string)
    rxSetComputeContext(sql)
    
  3. After running the step1 and step2 scripts, Debra goes to SQL Server Management Studio to log in and view the results of these steps by running the following query:

    SELECT TOP 1000 * FROM [Hospital].[dbo].[LoS]

  4. Now she is ready for training the models. She creates and executes the following script to train and score a regression Random Forest (rxDForest) and a gradient boosted trees model (rxFastTrees) on the training set. This uses the new MicrosoftML package for Microsoft R Server (version 9.0.1). Both models will predict LOS. When she looks at the metrics of both models, she notices that along with a faster performance time, the rxFastTrees model also performs with lower error, so she decides to use this model for prediction.

    • step3_training_evaluation
  5. Debra will now use PowerBI to visualize the predictions created from her model. She creates the PowerBI Dashboard which you can find in the Hospital directory. If you want to refresh data in your PowerBI Dashboard, make sure to follow these instructions to provide the necessary information.

  6. A summary of this process and all the files involved is described in more detail here.

Step 3: Operationalize with Debra and Danny


Debra has completed her tasks. She has connected to the SQL database, executed code from her R IDE that pushed (in part) execution to the SQL machine. She has scored data, created LOS predictions, and also created a summary dashboard which she will hand off to Caroline and Chris - see below.

These models will be used daily on new data as new patients are admitted. Instead of going back to Debra each time, Danny can operationalize the code in TSQL files which he can then be scheduled to run daily.

Debra hands over her scripts to Danny who adds the code to the database as stored procedures, using embedded R code, or SQL queries.

Danny also creates a production pipeline, which uploads the daily data and then cleans it, performs feature engineering, and scores and saves predictions into a new table.

You can create the production pipeline using the following commands in a PowerShell window:

  1. Click on the windows key on your keyboard. Type the words PowerShell. Right click on Windows Powershell to and select Run as administrator to open the PowerShell window.

  2. In the Powershell command window, type the following command:

    Set-ExecutionPolicy Unrestricted -Scope Process
    

    Answer y to the prompt to allow the following scripts to execute.

  3. Now CD to the Hospital/SQLR directory and run one of the two following commands, inserting your server name (or “.” if you are on the same machine as the SQL server), database name, username, and password.

    • Run with no prompts:

        .\Length_Of_Stay.ps1 -ServerName "Server Name" -DBName "Database Name" -username "" -password "" -is_production "Y" -uninterrupted "Y"  
      
    • Run with prompts:

        .\Length_Of_Stay.ps1 -ServerName "Server Name" -DBName "Database Name" -username "" -password "" -is_production "Y" -uninterrupted "N"  
      
    • For example, uninterrupted mode for the rdemo user created by the create_user.sql script on your local machine, the command would be:

        .\Length_Of_Stay.ps1 -ServerName "localhost" -DBName "Hospital" -username "rdemo" -password "D@tascience" -is_production "Y" -uninterrupted "Y"  
      

You can explore these stored procedures by logging into SSMS and opening the Programmability>Stored Procedures section of the Hospital database. Log into SSMS using the rdemo user with SQL Server Authentication - the default password upon creating the solution was D@tascience, unless you changed this password.

You can find this script in the SQLR directory, and execute it yourself by following the PowerShell Instructions. As noted earlier, this is the fastest way to execute all the code included in this solution. (This will re-create the same set of tables and models as the above R scripts.)

Step 4: Deploy and Visualize with Caroline and Chris


Now that the models are in place and the dashboard is built, we will meet our last two personas - Caroline the CMIO and Chris the Care Line Manager.

Caroline the CMIO

Caroline will use the predictions to determine if resources are being allocated appropriately in her hospital network. Her dashboard will help her to not only make determinations about what facilities are being overtaxed, but also what resources at those facilities may need to be bolstered. For example, using the dashboard provided in this solution, Caroline is able to determine which facilities will not be discharging patients at the rate that they are coming in. Using this knowledge, she can then make recommendations to others to transfer and or re-route incoming patients to facilities that are experiencing less burden.

Additionally, Caroline will make recommendations on re-routing specific resources and personnel given demands. By using length of stay predictions, she is able to see which disease conditions are most prevalent in patients that will be staying in care facilities long term. For example, in seeing that heart failure patients are being predicted to spend a longer amounts of time in a specific facility, she will recommend additional heart failure resources be diverted to that facility.

Chris the Care Line Manager

Chris is directly involved with the care of patients. His role requires monitoring individual patient statuses as well as ensuring that staff is avilable to meet their patients’ specific care requirements. Additionally, Chris plans for the discharge of patients; determining if the patient will be discharged during a low staff time (such as weekends).

Length of stay prediction allows Chris to better plan for his patients’ care. In the provided dashboard, Chris is able to see the number of patients under his care by selecting his facility at the top of the page. He can then see all the patients in that facility today and a predicted number of days each has left until their discharge. This allows him to allocated appropriate resources be available for his patient population. He can also see when patients might be projected to leave on a Saturday or Sunday and can either ensure discharge occurs, or to plan for additional days of care in the inpatient setting. Additionally, the vitals and condition breakdowns allow the care line manager to closely monitor the status of those patients projected to be in the hospital for longer periods of time in order to ensure additional complications do not arise during their stay.

Remember that before the data in this dashboard can be refreshed to use your scored data, you must configure the dashboard as Debra did in step 2 of this workflow.