Campaign Optimization - Predicting How and When to Contact Leads

For the IT Administrator


As businesses are starting to acknowledge the power of data, leveraging machine learning techniques to grow has become a must. In particular, customer-oriented businesses can learn patterns from their data to intelligently design acquisition campaigns and convert the highest possible number of customers.

Among the key variables to learn from data are the best communication channel (e.g. SMS, Email, Call), the day of the week and the time of the day through which/ during which a given potential customer is targeted by a marketing campaign. This template provides a customer-oriented business with an analytics tool that helps determine the best combination of these three variables for each customer, based (among others) on financial and demographic data.

While this solution demonstrates the code with 100,000 leads for developing the model, using HDInsight Spark clusters makes it simple to extend to large data, both for training and scoring. The only thing that changes is the size of the data and the number of clusters; the code remains exactly the same.

System Requirements


This solution uses:

Cluster Maintenance


HDInsight Spark cluster billing starts once a cluster is created and stops when the cluster is deleted. See these instructions for important information about deleting a cluster and re-using your files on a new cluster.

Workflow Automation


Access RStudio on the cluster edge node by using the url of the form http://CLUSTERNAME.azurehdinsight.net/rstudio Run the script campain_main.R to perform all the steps of the solution.

Data Files


The following data files are available in the Campaign/Data directory in the storage account associated with the cluster:

File Description
Campaign_Detail.csv Raw data about each marketing campaign that occurred
Lead_Demography.csv Raw demographics and financial data about each customer
Market_Touchdown.csv Raw channel-day-time data used for every customer of Lead_Demography in every campaign he was targeted
Product.csv Raw data about the product marketed in each campaign