Campaign Optimization - Predicting How and When to Contact Leads

Template Contents


The following is the directory structure for this template:

  • Data This contains the copy of the simulated input data with 100K unique customers.
  • R This contains the R code to simulate the input datasets, pre-process them, create the analytical datasets, train the models, identify the champion model and provide recommendations.
  • Resources This directory contains other resources for the solution package.
  • SQLR This contains T-SQL code to pre-process the datasets, train the models, identify the champion model and provide recommendations. It also contains a PowerShell script to automate the entire process, including loading the data into the database (not included in the T-SQL code).
  • RSparkCluster This contains the R code to pre-process the datasets, train the models, identify the champion model and provide recommendations on a Spark cluster.

In this template with SQL Server ML Services, two versions of the SQL implementation and another version for HDInsight implementation:

  1. Model Development in R IDE . Run the R code in R IDE (e.g., RStudio, R Tools for Visual Studio).
  2. Operationalize in SQL. Run the SQL code in SQL Server using SQLR scripts from SSMS or from the PowerShell script.
  3. HDInsight Solution on Spark Cluster. Run this R code in RStudio on the edge node of the Spark cluster.

Copy of Input Datasets


File Description
Campaign_Detail.csv Raw data about each marketing campaign that occurred
Lead_Demography.csv Raw demographics and financial data about each customer
Market_Touchdown.csv Raw channel-day-time data used for every customer of Lead_Demography in every campaign he was targeted
Product.csv Raw data about the product marketed in each campaign

Model Development in R


These files in the R directory for the SQL solution.

File Description
Campaign Optimization R Notebook.ipynb Contains the Jupyter Notebook file that runs all the .R scripts
sql_connection.R Contains details of connection to SQL Server used in all other scripts
step0_data_generation.R Simulates the 4 input datasets, not needed unless you wish to regenerate data
step1_data_processing.R Uploads .csv files to SQL and performs data preprocessing steps such as inner joins and missing value treatment
step2_feature_engineering.R Performs Feature Engineering and creates the Analytical Dataset
step3_training_evaluation.R Builds the Random Forest & Gradient Boosting models, identifies the champion model
step4_campaign_recommendations.R Builds final recommendations from scoring 63 combinations per lead and selecting combo with highest conversion probability

Operationalize in SQL 2017


These files are in the SQLR directory.

File Description
step0_create_tables.sql SQL Script to create empty tables in SQL. PowerShell script should be used to load the input data
step1_data_processing.sql Replaces Missing values in dataset with the modes
step2_feature_engineering.sql Performs Feature Engineering and creates the Analytical Dataset
step3a_splitting.sql Splits the analytical dataset into Train and Test
step3b_train_model.sql Trains either RF or GBT model, depending on input parameter
step3c_test_evaluate_models.sql Tests both RF and GBT models
step4_campaign_recommendations.sql Scores data with best model and outputs recommendations
execute_yourself.sql Executes every stored procedure after running all the other .sql files
Campaign_Optimization.ps1 Loads the input data into the SQL server and automates the running of all .sql files
Readme.md Describes the stored procedures in more detail

HDInsight Solution on Spark Cluster


These files are in the RSparkCluster directory.

File Description
Copy_Dev2Prod.R Copies a model from the dev folder to the prod folder for production use
Create_LeadDemo_MarketTouch.R Generates Lead_Demography and Market_touchdown tables, used from step0. Not needed unless you wish to regenerate data
campaign_deployment.R Publishes the scoring function as an analytic web service
campaign_main.R Runs all steps of the solution
campaign_scoring.R Scores new data using a model developed from campaign_main.R
campaign_web_scoring.R Uses the scoring funtion created by campaign_deployment.R.
step0_data_generation.R Simulates the 4 input datasets, not needed unless you wish to regenerate data
step1_data_processing.R Uploads .csv files to SQL and performs data preprocessing steps such as inner joins and missing value treatment
step2_feature_engineering.R Performs Feature Engineering and creates the Analytical Dataset
step3_training_evaluation.R Builds the Random Forest & Gradient Boosting models, identifies the champion model
step4_campaign_recommendations.R Builds final recommendations from scoring 63 combinations per lead and selecting combo with highest conversion probability
step5_create_hive_table.R Stores recommendations in a Hive table for use in PowerBI

Resources for the Solution Package


File Description
createuser.sql Used during initial SQL Server setup to create the user and password and grant permissions
Campaign_Data_Dictionnary.xlsx Schema and description of the 4 input tables and variables
Images Directory of images used for the Readme.md in this package

< Home