R setup and example for SynapseML
Installation
Requirements: Ensure that R and devtools installed on your machine.
Also make sure you have Apache Spark installed. If you are using Sparklyr, you can use spark-install. Be sure to specify the correct version. As of this writing, that should be version="3.2". spark_install is a bit eccentric and may install a slightly different version. Be sure that the version you get is one that you want.
On Windows, download WinUtils.exe and copy it into the bin directory of your Spark installation, e.g. C:\Users\user\AppData\Local\Spark\spark-3.3.2-bin-hadoop3\bin
To install the current SynapseML package for R, first install synapseml-core:
...
devtools::install_url("https://mmlspark.azureedge.net/rrr/synapseml-core-0.11.0.zip")
...
and then install any or all of the following packages, depending on your intended usage:
synapseml-cognitive, synapseml-deep-learning, synapseml-lightgbm, synapseml-opencv, synapseml-vw
In other words:
...
devtools::install_url("https://mmlspark.azureedge.net/rrr/synapseml-cognitive-0.11.0.zip")
devtools::install_url("https://mmlspark.azureedge.net/rrr/synapseml-deep-learning-0.11.0.zip")
devtools::install_url("https://mmlspark.azureedge.net/rrr/synapseml-lightgbm-0.11.0.zip")
devtools::install_url("https://mmlspark.azureedge.net/rrr/synapseml-opencv-0.11.0.zip")
devtools::install_url("https://mmlspark.azureedge.net/rrr/synapseml-vw-0.11.0.zip")
...
Importing libraries and setting up spark context
Installing all dependencies may be time-consuming. When complete, run:
...
library(sparklyr)
library(dplyr)
config <- spark_config()
config$sparklyr.defaultPackages <- "com.microsoft.azure:synapseml_2.12:0.11.3"
sc <- spark_connect(master = "local", config = config)
...
This creates a spark context on your local machine.
We then need to import the R wrappers:
...
 library(synapseml.core)
 library(synapseml.cognitive)
 library(synapseml.deep.learning)
 library(synapseml.lightgbm)
 library(synapseml.opencv)
 library(synapseml.vw)
...
Example
We can use the faithful dataset in R:
...
faithful_df <- copy_to(sc, faithful)
cmd_model = ml_clean_missing_data(
              x=faithful_df,
              inputCols = c("eruptions", "waiting"),
              outputCols = c("eruptions_output", "waiting_output"),
              only.model=TRUE)
sdf_transform(cmd_model, faithful_df)
...
You should see the output:
...
# Source:   table<sparklyr_tmp_17d66a9d490c> [?? x 4]
# Database: spark_connection
   eruptions waiting eruptions_output waiting_output
          <dbl>   <dbl>            <dbl>          <dbl>
          1     3.600      79            3.600             79
          2     1.800      54            1.800             54
          3     3.333      74            3.333             74
          4     2.283      62            2.283             62
          5     4.533      85            4.533             85
          6     2.883      55            2.883             55
          7     4.700      88            4.700             88
          8     3.600      85            3.600             85
          9     1.950      51            1.950             51
          10     4.350      85            4.350             85
          # ... with more rows
...
Azure Databricks
In Azure Databricks, you can install devtools and the spark package from URL and then use spark_connect with method = "databricks":
install.packages("devtools")
devtools::install_url("https://mmlspark.azureedge.net/rrr/synapseml-0.11.3.zip")
library(sparklyr)
library(dplyr)
sc <- spark_connect(method = "databricks")
faithful_df <- copy_to(sc, faithful)
unfit_model = ml_light_gbmregressor(sc, maxDepth=20, featuresCol="waiting", labelCol="eruptions", numIterations=10, unfit.model=TRUE)
ml_train_regressor(faithful_df, labelCol="eruptions", unfit_model)
Building from Source
Our R bindings are built as part of the normal build process. To get a quick build, start at the root of the synapseml directory, and find the generated files. For instance, to find the R files for deep-learning, run
sbt packageR
ls ./deep-learning/target/scala-2.12/generated/src/R/synapseml/R
You can then run R in a terminal and install the above files directly:
...
devtools::install_local("./deep-learning/target/scala-2.12/generated/src/R/synapseml/R")
...