Loan ChargeOff Prediction

Are you unable to connect to your Virtual Machine? See this important information for how to resolve.

A charged off loan is a loan that is declared by a creditor (usually a lending institution) that an amount of debt is unlikely to be collected, usually when the loan repayment is severely delinquent by the debtor. Given that high chargeoff has negative impact on lending institutions’ year end financials, lending institutions often monitor loan chargeoff risk very closely to prevent loans from getting charged-off. Using Azure HDInsight ML Server, a lending institution can leverage machine learning predictive analytics to predict the likelihood of loans getting charged off and run a report on the analytics result stored in HDFS and hive tables.

Select the platform you wish to explore:

On the VM created for you using the 'Deploy to Azure' button on the Quick start page, the SQL Server 2017 database LoanChargeOff_R contains all the data and results of the end-to-end modeling process.
For customers who prefer an on-premise solution, the implementation with SQL Server ML Services is a great option that takes advantage of the powerful combination of SQL Server and the R language. A Windows PowerShell script to invoke the SQL scripts that execute the end-to-end modeling process is provided for convenience. Note that you may need to upgrade ML Services to install at least the earliest version that shipped with MicrosoftML package (9.0.1). Instructions to upgrade ML Services on SQL Server 2016 are here.
This solution shows how to pre-process data (cleaning and feature engineering), train prediction models, and perform scoring on the HDInsight Spark cluster with Microsoft ML Server deployed using the 'Deploy to Azure' button on the Quick start page.

HDInsight Spark cluster billing starts once a cluster is created and stops when the cluster is deleted. See these instructions for important information about deleting a cluster and re-using your files on a new cluster.

We have modeled the steps in the template after a realistic team collaboration on a data science process. Data scientists do the data preparation, model training, and evaluation from their favorite R IDE.using the Open Source Edition of RStudio Server on the cluster edge node. DBAs can take care of the deployment using SQL stored procedures with embedded R code. We show how each of these steps can be executed on a SQL Server client environment such as SQL Server Management Studio. Scoring is implemented with ML Server Operationalization. Finally, a Power BI report is used to visualize the deployed results.