Campaign Optimization - Predicting How and When to Contact Leads

For the Data Scientist - Develop with R


SQL Server ML Services takes advantage of the power of SQL Server and RevoScaleR (Microsoft ML Server package) by allowing R to run on the same server as the database. It includes a database service that runs outside the SQL Server process and communicates securely with the R runtime.

This solution package shows how to pre-process data (cleaning and feature engineering), train prediction models, and perform scoring on the SQL Server machine.
HDInsight is a cloud Spark and Hadoop service for the enterprise. HDInsight is also the only managed cloud Hadoop solution with integration to Microsoft ML Server.

This solution shows how to pre-process data (cleaning and feature engineering), train prediction models, and perform scoring on an HDInsight Spark cluster with Microsoft ML Server.
Data scientists who are testing and developing solutions can work from the convenience of their R IDE on their client machine, while setting the computation context to SQL (see R folder for code). They can also deploy the completed solutions to SQL Server 2016 by embedding calls to R in stored procedures (see SQLR folder for code). These solutions can then be further automated by the use of SQL Server Integration Services and SQL Server agent: a PowerShell script (.ps1 file) automates the running of the SQL code.
Data scientists who are testing and developing solutions can work from the browser-based Open Source Edition of RStudio Server on the HDInsight Spark cluster edge node, while using a compute context to control whether computation will be performed locally on the edge node, or whether it will be distributed across the nodes in the HDInsight Spark cluster.

Campaign Optimization


This template is focused on marketing campaign optimization. In particular, customer-oriented businesses can learn patterns from their data to intelligently design acquisition campaigns and convert the highest possible number of customers.

Among the key variables to learn from data are the best communication channel (e.g. SMS, Email, Call), the day of the week and the time of the day through which/ during which a given potential customer is targeted by a marketing campaign.

In this solution, the final scored database table in SQL Server gives the recommendations for how and when to contact each lead. This data is then visualized in PowerBI.

In this solution, an Apache Hive table will be created to show predicted recommendations for how and when to contact each lead. This data is then visualized in PowerBI.

To try this out yourself, visit the Quick Start page.

Below is a description of what happens in each of the steps: dataset creation, model development, recommendations, and deployment in more detail.

Analytical Dataset Preprocessing and Feature Engineering


This part simulates input data and performs preprocessing and feature engineering to create the analytical dataset.

The R code to perform these steps can be run from an R client with the following scripts:
One of the key functions used is the rxExec function. This function is similar to the foreach function in open source R, while it can parallel computation across different nodes under the Spark compute context. In this implementation, we split the whole data set into multiple subsets by Lead_ID and apply transformation functions to each subset with the rxExec function. The computational time is significantly reduced since the computation is fully paralleled across nodes and cores on the cluster.

The script campaign_main.R is used to run each of the scripts step1*-step4*, described in the sections below.

step1_data_processing.R

This script exports the 4 input data sets to SQL tables, merges them, and then performs missing value treatment on the raw table in-database.
This script reads the 4 input data sets, merges them, and then performs missing value treatment on the raw table in-database.

step2_feature_engineering.R

This script performs feature engineering in-database to generate the Analytical Dataset.

  1. SMS_Count, Call_Count, Email_Count: number of times every customer (Lead_Id) has been contacted through every Channel.

  2. Previous_Channel: the previous channel used towards every customer for every campaign activity.

Finally, only the latest campaign activity for every customer is kept.

The corresponding SQL stored procedures can be run manually after loading the data into tables with PowerShell. They can be found in the scripts step1_data_processing.sql and step2_feature_engineering.sql.

Model Development


Two models, Random Forest and Gradient Boosted Trees are developed to model Campaign Responses. The R code to develop these models is included in the step3_training_evaluation.R script

step3_training_evaluation.R

In this step, after splitting the analytical data set into a training and a testing set, two prediction models are built (Random Forest and Gradient Boosted Trees) on the training set. Once the models are trained, AUC of both the models are calculated using the testing set.

The R script draws the ROC or Receiver Operating Characteristic for each prediction model. It shows the performance of the model in terms of true positive rate and false positive rate, when the decision threshold varies.

The AUC is a number between 0 and 1. It corresponds to the area under the ROC curve. It is a performance metric related to how good the model is at separating the two classes (converted clients vs. not converted), with a good choice of decision threshold separating between the predicted probabilities.  The closer the AUC is to 1, and the better the model is. Given that we are not looking for that optimal decision threshold, the AUC is more representative of the prediction performance than the Accuracy (which depends on the threshold).

The model with the best AUC is selected as the champion model and will be used for recommendations.

The corresponding SQL stored procedures can be run manually. They can be found in the scripts step3a_splitting.sql, step3b_train_model.sql, and step3c_test_model.sql. Note that the SQL stored procedures use a normalized version of the analytical dataset in order to speed up the computations. The normalization step can be found in step3_normalization.sql, and should be run before steps 3a, 3b, and 3c.

Computing Recommendations


The champion model is used to provide recommendations about how and when to contact each customer. The R code to provide the recommendations is inlcuded in the step4_campaign_recommendations.R script.

step4_campaign_recommendations.R

This script creates a full table with 63 rows for each customer, corresponding to the possible 63 combinations of day, time and channel (7 Days x 3 Times x 3 Channels = 63). Each combination is scored with the champion model and, for each customer, the one with the highest conversion probability is used as the recommendation for that lead. Results from this step are stored in the Recommendations Hive database table.

The corresponding SQL stored procedures can be run manually in step4_campaign_recommendations.sql.
The script campaign_scoring.R can be used to score new data - it goes through step1_data_processing, step2_feature_engineering, and step4_campaign_recommendations to create recommendations for the new data.

Deploy and Visualize Results


The deployed data resides in a newly created database table, showing recommendations for each lead. The final step of this solution visualizes these recommendations, and once the new campaigns have been completed we can also visualize a summary of how well the model worked.

Deploy

The script campaign_deployment.R creates and tests a analytic web service. The web service can then be used from another application to score future data. The file web_scoring.R can be downloaded to invoke this web service locally on any computer with Microsoft ML Server 9.0.1 or later installed.

Visualize

The final step of this solution visualizes these recommendations.

You can access this dashboard in either of the following ways:

System Requirements

The following are required to run the scripts in this solution:

Template Contents


View the contents of this solution template.

To try this out yourself:

< Home