Template Contents

The following is the directory structure for this template:

Data This contains the copy of the simulated input data with 100K unique customers.
R This contains the R code to simulate the input datasets, pre-process them, create the analytical datasets, train the models, identify the champion model and provide recommendations.
Resources This directory contains other resources for the solution package.
SQLR This contains T-SQL code to pre-process the datasets, train the models, identify the champion model and provide recommendations. It also contains a PowerShell script to automate the entire process, including loading the data into the database (not included in the T-SQL code).
RSparkCluster This contains the R code to pre-process the datasets, train the models, identify the champion model and provide recommendations on a Spark cluster.

In this template with SQL Server ML Services, two versions of the SQL implementation and another version for HDInsight implementation:

Model Development in R IDE . Run the R code in R IDE (e.g., RStudio, R Tools for Visual Studio).
Operationalize in SQL. Run the SQL code in SQL Server using SQLR scripts from SSMS or from the PowerShell script.
HDInsight Solution on Spark Cluster. Run this R code in RStudio on the edge node of the Spark cluster.

File	Description
Campaign_Detail.csv	Raw data about each marketing campaign that occurred
Lead_Demography.csv	Raw demographics and financial data about each customer
Market_Touchdown.csv	Raw channel-day-time data used for every customer of Lead_Demography in every campaign he was targeted
Product.csv	Raw data about the product marketed in each campaign

These files in the R directory for the SQL solution.

File	Description
Campaign Optimization R Notebook.ipynb	Contains the Jupyter Notebook file that runs all the .R scripts
sql_connection.R	Contains details of connection to SQL Server used in all other scripts
step0_data_generation.R	Simulates the 4 input datasets, not needed unless you wish to regenerate data
step1_data_processing.R	Uploads .csv files to SQL and performs data preprocessing steps such as inner joins and missing value treatment
step2_feature_engineering.R	Performs Feature Engineering and creates the Analytical Dataset
step3_training_evaluation.R	Builds the Random Forest & Gradient Boosting models, identifies the champion model
step4_campaign_recommendations.R	Builds final recommendations from scoring 63 combinations per lead and selecting combo with highest conversion probability

These files are in the SQLR directory.

File	Description
step0_create_tables.sql	SQL Script to create empty tables in SQL. PowerShell script should be used to load the input data
step1_data_processing.sql	Replaces Missing values in dataset with the modes
step2_feature_engineering.sql	Performs Feature Engineering and creates the Analytical Dataset
step3a_splitting.sql	Splits the analytical dataset into Train and Test
step3b_train_model.sql	Trains either RF or GBT model, depending on input parameter
step3c_test_evaluate_models.sql	Tests both RF and GBT models
step4_campaign_recommendations.sql	Scores data with best model and outputs recommendations
execute_yourself.sql	Executes every stored procedure after running all the other .sql files
Campaign_Optimization.ps1	Loads the input data into the SQL server and automates the running of all .sql files
Readme.md	Describes the stored procedures in more detail

See For the Database Analyst for more information about these files.
Follow the PowerShell Instructions to execute the PowerShell script which automates the running of all these .sql files.

These files are in the RSparkCluster directory.

File	Description
Copy_Dev2Prod.R	Copies a model from the dev folder to the prod folder for production use
Create_LeadDemo_MarketTouch.R	Generates Lead_Demography and Market_touchdown tables, used from step0. Not needed unless you wish to regenerate data
campaign_deployment.R	Publishes the scoring function as an analytic web service
campaign_main.R	Runs all steps of the solution
campaign_scoring.R	Scores new data using a model developed from campaign_main.R
campaign_web_scoring.R	Uses the scoring funtion created by campaign_deployment.R.
step0_data_generation.R	Simulates the 4 input datasets, not needed unless you wish to regenerate data
step1_data_processing.R	Uploads .csv files to SQL and performs data preprocessing steps such as inner joins and missing value treatment
step2_feature_engineering.R	Performs Feature Engineering and creates the Analytical Dataset
step3_training_evaluation.R	Builds the Random Forest & Gradient Boosting models, identifies the champion model
step4_campaign_recommendations.R	Builds final recommendations from scoring 63 combinations per lead and selecting combo with highest conversion probability
step5_create_hive_table.R	Stores recommendations in a Hive table for use in PowerBI

File	Description
createuser.sql	Used during initial SQL Server setup to create the user and password and grant permissions
Campaign_Data_Dictionnary.xlsx	Schema and description of the 4 input tables and variables
Images	Directory of images used for the Readme.md in this package

Campaign Optimization - Predicting How and When to Contact Leads