Loan ChargeOff Prediction

Typical Workflow


There are multiple benefits for lending institutions to equip with loan chargeoff prediction data. Charging off a loan is the last resort that the bank will do on a severely delinquent loan, with the predictive data at hand, the loan officer could offer personalized incentives like lower interest rate or longer repayment period to help customers to keep making loan payments and thus prevent the loan of getting charged off. To get to this type of prediction data, often credit unions or banks manually handcraft the data based on customers' past payment history and performed simple statistical regression analysis. This method is highly subject to data compilation error and not statistically sound.

This solution template demonstrates a solution end to end to run predictive analytics on loan data and produce scoring on chargeoff probability. A PowerBI report will also walk through the analysis and trend of credit loans and prediction of chargeoff probability.

To demonstrate a typical workflow, we’ll introduce you to a few personas. You can follow along by performing the same steps for each persona.

Step 1: Server Setup and Configuration with Danny the DB Analyst


Let me introduce you to Danny, the Database Analyst. Danny is the main contact for SQL Server database administration and application integration. Danny was responsible for installing and configuring the SQL Server. He has added a user named with all the necessary permissions to execute R scripts on the server and modify the LoanChargeOff database. This was done through the createuser.sql file. This step has already been done on the VM you deployed using the 'Deploy to Azure' button on the Quick start page. Alternatively, Danny could also run LoanChargeOff.ps1 to run the end to end workflow that includes setting up of SQL Server user login, import raw data to SQL Server tables, view creation, training and testing and prediction.

This step has already been done on your 'Deploy to Azure' VM.

Step 2: Data Prep and Modeling with Debra the Data Scientist


Now let’s meet Debra, the Data Scientist. Debra’s job is to use loan payment data to predict loan chargeoff risk. Debra’s preferred language for developing the models is using R and SQL. She uses Microsoft ML Services with SQL Server 2017 as it provides the capability to run large datasets and also is not constrained by memory restrictions of Open Source R. 

After analyzing the data she opted to create multiple models and choose the best one.  She will create five machine learning models and compare them, then use the one she likes best to compute a prediction for each loan, and then select the loan with the highest probability of chargeoff.

Debra will work on her own machine, using R Client to execute these R scripts. R Client is already installed on the VM. She will also use an IDE to run R.

On your VM, R Tools for Visual Studio is installed. You will however have to either log in or create a new account for using this tool. If you prefer, you can download and install RStudio on your VM instead.