Skip to main content
Version: Next

Classification - Adult Census

In this example, we try to predict incomes from the Adult Census dataset.

First, we import the packages (use help(synapse) to view contents),

Now let's read the data and split it to train and test sets:

data =
data =["education", "marital-status", "hours-per-week", "income"])
train, test = data.randomSplit([0.75, 0.25], seed=123)

TrainClassifier can be used to initialize and fit a model, it wraps SparkML classifiers. You can use help( to view the different parameters.

Note that it implicitly converts the data into the format expected by the algorithm: tokenize and hash strings, one-hot encodes categorical variables, assembles the features into a vector and so on. The parameter numFeatures controls the number of hashed features.

from import TrainClassifier
from import LogisticRegression

model = TrainClassifier(
model=LogisticRegression(), labelCol="income", numFeatures=256

Finally, we save the model so it can be used in a scoring program.

from import *

if running_on_synapse() or running_on_synapse_internal():
elif running_on_databricks():
elif running_on_binder():
print(f"{current_platform()} platform not supported")