Perform customer clustering using Python and SQL Server ML Services
In this tutorial, we are going to get ourselves familiar with clustering. Clustering can be explained as organizing data into groups where members of a group are similar in some way. We will be using the Kmeans algorithm to perform the clustering of customers. This can for example be used to target a specific group of customers for marketing efforts. Kmeans clustering is an unsupervised learning algorithm that tries to group data based on similarities. Unsupervised learning means that there is no outcome to be predicted, and the algorithm just tries to find patterns in the data. You will learn how to perform clustering using Kmeans and analyze the results. We will also cover how you can deploy a clustering solution using SQL Server. You can copy code as you follow the tutorial. All code is also available on GitHub.
Step 1.1 Install SQL Server with in-database R / Machine Learning Services
- If you don’t have SQL Server 2016 Developer (or above) installed:
*Click here to download the preview of SQL Server 2017
*Click here here to download the SQL Server 2016 exe (This version only supports R for Machine Learning)
- Run it to start the SQL installer
- Click Accept> after you have read the license terms
- On the Feature Selection page, select: R Services (In-Database) for SQL Server 2016 or Machine Learning Services (In-Database) for SQL Server 2017
- Don’t forget to choose R/Python or both
- If you chose R: On the page, Consent to Install Microsoft R Open>, click Accept.
- If you chose Python: On the page, Consent to Python>, click Accept.
- Click Install to proceed with the installation
You now have SQL Server installed with in-database ML services, running locally on your Windows computer! Check out the next section to continue installing prerequisites.
Step 1.2 Install SQL Server Management Studio (SSMS)
Download and install SQL Server Management studio: SSMS
Now you have installed a tool you can use to easily manage your database objects and scripts.
Step 1.3 Enable external script execution
Run SSMS and open a new query window. Then execute the script below to enable your instance to run R scripts in SQL Server.
EXEC sp_configure 'external scripts enabled', 1;
RECONFIGURE WITH OVERRIDE
You can read more about configuring Machine Learning Services here. Don’t forget to restart your SQL Server Instance after the configuration! You can restart in SSMS by right clicking on the instance name in the Object Explorer and choose Restart.
Optional: If you want, you can also download SSMS custom reports available on github. The report “R Services - Configuration.rdl” for example provides an overview of the R runtime parameters and gives you an option to configure your instance with a button click. To import a report in SSMS, right click on Server Objects in the SSMS Object Explorer and choose Reports -> Custom reports. Upload the .rdl file.
Now you have enabled external script execution so that you can run Python code inside SQL Server!
Step 1.4 Install and configure your Python development environment
1.You need to install a Python IDE. Here are some suggestions:
*Python Tools for Visual Studio (PTVS) Download
*VS Code (download) with the Python Extension and the mssql extension
*PyCharm Download
Step 1.5 Install remote Python client libraries
Note!!! To be able to use some of the functions in this tutorial, you need to have the revoscalepy package.
Follow instructions here to learn how you can install Python client libaries for remote execution against SQL Server ML Services:
How to install Python client libraries
Terrific, now your SQL Server instance is able to host and run R code and you have the necessary development tools installed and configured! The next section will walk you through how to do clustering using R.
Have Questions?
Happy to help! You can find us on GitHub, MSDN Forums, and StackOverflow. We also monitor the #SQLServerDev hashtag on Twitter.