Skip to main content
Version: 0.11.3

SynapseML Development Setup

  1. Install JDK 11
    • You may need an Oracle login to download.
  2. Install SBT
  3. Fork the repository on GitHub
  4. Clone your fork
    • git clone<your GitHub handle>/SynapseML.git
    • This command will automatically add your fork as the default remote, called origin
  5. Add another Git Remote to track the original SynapseML repo. It's recommended to call it upstream:
  6. Go to the directory where you cloned the repo (for instance, SynapseML) with cd SynapseML
  7. Run sbt to compile and grab datasets
    • sbt setup
  8. Install IntelliJ
  9. Configure IntelliJ
    • Install Scala plugin during initialization
    • OPEN the SynapseML directory from IntelliJ
    • If the project doesn't automatically import, click on build.sbt and import the project
  10. Prepare your Python Environment
    • Install Miniconda
    • Note: if you want to run conda commands from IntelliJ, you may need to select the option to add conda to PATH during installation.
    • Activate the synapseml conda environment by running conda env create -f environment.yml from the synapseml directory.

      If you're using a Windows machine, remove horovod requirement in the environment.yml file, because horovod installation only supports Linux or macOS. Horovod is used only for namespace

  11. On Windows, install WinUtils
    • Download WinUtils.exe
    • Place it in C:\Program Files\Hadoop\bin
    • Add an environment variable HADOOP_HOME with value C:\Program Files\Hadoop
    • Append C:\Program Files\Hadoop\bin to PATH environment variable


If you will be regularly contributing to the SynapseML repo, you'll want to keep your fork synced with the upstream repository. Please read this GitHub doc to know more and learn techniques about how to do it.

Publishing and Using Build Secrets

To use secrets in the build, you must be part of the synapsemlkeyvault and Azure subscription. If you're MSFT internal and would like to be added, reach out to

SBT Command Guide

Scala build commands

compile, test:compile and it:compile

Compiles the main, test, and integration test classes respectively


Runs all synapsemltests


Runs scalastyle check on main


Runs scalastyle check on test


Generates documentation for scala sources

Python Commands


Creates a conda environment synapseml from environment.yml if it doesn't already exist. This env is used for python testing. Activate this env before using python build commands.


Removes synapseml conda env


Compiles scala, runs python generation scripts, and creates a wheel


Generates documentation for generated python code


Installs generated python wheel into existing env


Generates and runs python tests

Environment + Publishing Commands


Downloads all datasets used in tests to target folder


Combination of compile, test:compile, it:compile, getDatasets


Packages the library into a jar


Publishes Jar to SynapseML's Azure blob-based Maven repo. (Requires Keys)


Publishes library to the local Maven repo


Publishes scala and python doc to SynapseML's Azure storage account. (Requires Keys)


Publishes the library to Sonatype staging repo


Promotes the published Sonatype artifact