Skip to main content
Version: 0.9.5



To try out SynapseML on a Python (or Conda) installation you can get Spark installed via pip with pip install pyspark. You can then use pyspark as in the above example, or from python:

import pyspark
spark = pyspark.sql.SparkSession.builder.appName("MyApp") \
.config("spark.jars.packages", "") \
.config("spark.jars.repositories", "") \


If you're building a Spark application in Scala, add the following lines to your build.sbt:

resolvers += "SynapseML" at ""
libraryDependencies += "" %% "synapseml" % "0.9.5"

Spark package

SynapseML can be conveniently installed on existing Spark clusters via the --packages option, examples:

spark-shell --packages
pyspark --packages
spark-submit --packages MyApp.jar

A similar technique can be used in other Spark contexts too. For example, you can use SynapseML in AZTK by adding it to the .aztk/spark-defaults.conf file.


To install SynapseML on the Databricks cloud, create a new library from Maven coordinates in your workspace.

For the coordinates use: with the resolver: Ensure this library is attached to your target cluster(s).

Finally, ensure that your Spark cluster has at least Spark 3.12 and Scala 2.12.

You can use SynapseML in both your Scala and PySpark notebooks. To get started with our example notebooks, import the following databricks archive:

Apache Livy and HDInsight

To install SynapseML from within a Jupyter notebook served by Apache Livy, the following configure magic can be used. You'll need to start a new session after this configure cell is executed.

Excluding certain packages from the library may be necessary due to current issues with Livy 0.5

%%configure -f
"name": "synapseml",
"conf": {
"spark.jars.packages": "",
"spark.jars.excludes": "org.scala-lang:scala-reflect,org.apache.spark:spark-tags_2.12,org.scalactic:scalactic_2.12,org.scalatest:scalatest_2.12"

In Azure Synapse, "spark.yarn.user.classpath.first" should be set to "true" to override the existing SynapseML packages

%%configure -f
"name": "synapseml",
"conf": {
"spark.jars.packages": "",
"spark.jars.excludes": "org.scala-lang:scala-reflect,org.apache.spark:spark-tags_2.12,org.scalactic:scalactic_2.12,org.scalatest:scalatest_2.12",
"spark.yarn.user.classpath.first": "true"


The easiest way to evaluate SynapseML is via our pre-built Docker container. To do so, run the following command:

docker run -it -p 8888:8888 -e ACCEPT_EULA=yes

Navigate to http://localhost:8888/ in your web browser to run the sample notebooks. See the documentation for more on Docker use.

To read the EULA for using the docker image, run

docker run -it -p 8888:8888 eula

Building from source

SynapseML has recently transitioned to a new build infrastructure. For detailed developer docs, see the Developer Readme

If you're an existing SynapseML developer, you'll need to reconfigure your development setup. We now support platform independent development and better integrate with intellij and SBT. If you encounter issues, reach out to our support email!

R (Beta)

To try out SynapseML using the R autogenerated wrappers, see our instructions. Note: This feature is still under development and some necessary custom wrappers may be missing.