For the IT Administrator
Among the key variables to learn from data are the best communication channel (e.g. SMS, Email, Call), the day of the week and the time of the day through which/ during which a given potential customer is targeted by a marketing campaign. This template provides a customer-oriented business with an analytics tool that helps determine the best combination of these three variables for each customer, based (among others) on financial and demographic data.
While this solution demonstrates the code with 100,000 leads for developing the model, using HDInsight Spark clusters makes it simple to extend to large data, both for training and scoring. The only thing that changes is the size of the data and the number of clusters; the code remains exactly the same.
System Requirements
This solution uses:
Cluster Maintenance
HDInsight Spark cluster billing starts once a cluster is created and stops when the cluster is deleted. See these instructions for important information about deleting a cluster and re-using your files on a new cluster.
Workflow Automation
Access RStudio on the cluster edge node by using the url of the form http://CLUSTERNAME.azurehdinsight.net/rstudio
Run the script campain_main.R to perform all the steps of the solution.
Data Files
The following data files are available in the Campaign/Data directory in the storage account associated with the cluster:
File | Description |
---|---|
Campaign_Detail.csv | Raw data about each marketing campaign that occurred |
Lead_Demography.csv | Raw demographics and financial data about each customer |
Market_Touchdown.csv | Raw channel-day-time data used for every customer of Lead_Demography in every campaign he was targeted |
Product.csv | Raw data about the product marketed in each campaign |