Skip to main content
Version: 1.0.1

Build your first SynapseML models

This tutorial provides a brief introduction to SynapseML. In particular, we use SynapseML to create two different pipelines for sentiment analysis. The first pipeline combines a text featurization stage with LightGBM regression to predict ratings based on review text from a dataset containing book reviews from Amazon. The second pipeline shows how to use prebuilt models through the Azure AI Services to solve this problem without training data.

Load a dataset

Load your dataset and split it into train and test sets.

train, test = (
.randomSplit([0.8, 0.2])


Create the training pipeline

Create a pipeline that featurizes data using TextFeaturizer from the library and derives a rating using the LightGBMRegressor function.

from import Pipeline
from import TextFeaturizer
from import LightGBMRegressor

model = Pipeline(
TextFeaturizer(inputCol="text", outputCol="features"),
LightGBMRegressor(featuresCol="features", labelCol="rating"),

Predict the output of the test data

Call the transform function on the model to predict and display the output of the test data as a dataframe.


Use Azure AI Services to transform data in one step

Alternatively, for these kinds of tasks that have a prebuilt solution, you can use SynapseML's integration with Azure AI Services to transform your data in one step.

from import AnalyzeText
from import find_secret

model = AnalyzeText(
secret_name="ai-services-api-key", keyvault="mmlspark-build-keys"
), # Replace the call to find_secret with your key as a python string.