For the IT Administrator

This page describes the HDInsight Spark solution.

For the IT Administrator

As businesses are starting to acknowledge the power of data, leveraging machine learning techniques to grow has become a must. In particular, customer-oriented businesses can learn patterns from their data to intelligently design acquisition campaigns and convert the highest possible number of customers.

Among the key variables to learn from data are the best communication channel (e.g. SMS, Email, Call), the day of the week and the time of the day through which/ during which a given potential customer is targeted by a marketing campaign. This template provides a customer-oriented business with an analytics tool that helps determine the best combination of these three variables for each customer, based (among others) on financial and demographic data.

While this solution demonstrates the code with 100,000 leads for developing the model, using HDInsight Spark clusters makes it simple to extend to large data, both for training and scoring. The only thing that changes is the size of the data and the number of clusters; the code remains exactly the same.

System Requirements

This solution uses:

ML Server for HDInsight

Cluster Maintenance

HDInsight Spark cluster billing starts once a cluster is created and stops when the cluster is deleted. See these instructions for important information about deleting a cluster and re-using your files on a new cluster.

Workflow Automation

Access RStudio on the cluster edge node by using the url of the form http://CLUSTERNAME.azurehdinsight.net/rstudio Run the script campain_main.R to perform all the steps of the solution.

Data Files

The following data files are available in the Campaign/Data directory in the storage account associated with the cluster:

File	Description
Campaign_Detail.csv	Raw data about each marketing campaign that occurred
Lead_Demography.csv	Raw demographics and financial data about each customer
Market_Touchdown.csv	Raw channel-day-time data used for every customer of Lead_Demography in every campaign he was targeted
Product.csv	Raw data about the product marketed in each campaign

Campaign Optimization - Predicting How and When to Contact Leads