Autologging
Automatic Logging
MLflow automatic logging allows you to log metrics, parameters, and models without the need for explicit log statements. SynapseML supports autologging for every model in the library.
To enable autologging for SynapseML:
- Download this customized log_model_allowlist file and put it at a place that your code has access to. For example:
- In Synapse
wasb://<containername>@<accountname>.blob.core.windows.net/PATH_TO_YOUR/log_model_allowlist.txt
- In Databricks
/dbfs/FileStore/PATH_TO_YOUR/log_model_allowlist.txt
.
- Set spark configuration
spark.mlflow.pysparkml.autolog.logModelAllowlistFile
to the path of yourlog_model_allowlist.txt
file. - Call
mlflow.pyspark.ml.autolog()
before your training code to enable autologging for all supported models.
Note:
- If you want to support autologging of PySpark models not present in the log_model_allowlist file, you can add such models to the file.
- If you've enabled autologging, then don't write explicit
with mlflow.start_run()
as it might cause multiple runs for one single model or one run for multiple models.
Configuration process in Databricks as an example
- Install latest MLflow via
%pip install mlflow
- Upload your customized
log_model_allowlist.txt
file to dbfs by clicking File/Upload Data button on Databricks UI. - Set Cluster Spark configuration following this documentation
spark.mlflow.pysparkml.autolog.logModelAllowlistFile /dbfs/FileStore/PATH_TO_YOUR/log_model_allowlist.txt
- Run the following line before your training code executes.
mlflow.pyspark.ml.autolog()
You can customize how autologging works by supplying appropriate parameters.
- To find your experiment's results via the
Experiments
tab of the MLFlow UI.
Example for ConditionalKNNModel
from pyspark.ml.linalg import Vectors
from synapse.ml.nn import *
df = spark.createDataFrame([
(Vectors.dense(2.0,2.0,2.0), "foo", 1),
(Vectors.dense(2.0,2.0,4.0), "foo", 3),
(Vectors.dense(2.0,2.0,6.0), "foo", 4),
(Vectors.dense(2.0,2.0,8.0), "foo", 3),
(Vectors.dense(2.0,2.0,10.0), "foo", 1),
(Vectors.dense(2.0,2.0,12.0), "foo", 2),
(Vectors.dense(2.0,2.0,14.0), "foo", 0),
(Vectors.dense(2.0,2.0,16.0), "foo", 1),
(Vectors.dense(2.0,2.0,18.0), "foo", 3),
(Vectors.dense(2.0,2.0,20.0), "foo", 0),
(Vectors.dense(2.0,4.0,2.0), "foo", 2),
(Vectors.dense(2.0,4.0,4.0), "foo", 4),
(Vectors.dense(2.0,4.0,6.0), "foo", 2),
(Vectors.dense(2.0,4.0,8.0), "foo", 2),
(Vectors.dense(2.0,4.0,10.0), "foo", 4),
(Vectors.dense(2.0,4.0,12.0), "foo", 3),
(Vectors.dense(2.0,4.0,14.0), "foo", 2),
(Vectors.dense(2.0,4.0,16.0), "foo", 1),
(Vectors.dense(2.0,4.0,18.0), "foo", 4),
(Vectors.dense(2.0,4.0,20.0), "foo", 4)
], ["features","values","labels"])
cnn = (ConditionalKNN().setOutputCol("prediction"))
cnnm = cnn.fit(df)
test_df = spark.createDataFrame([
(Vectors.dense(2.0,2.0,2.0), "foo", 1, [0, 1]),
(Vectors.dense(2.0,2.0,4.0), "foo", 4, [0, 1]),
(Vectors.dense(2.0,2.0,6.0), "foo", 2, [0, 1]),
(Vectors.dense(2.0,2.0,8.0), "foo", 4, [0, 1]),
(Vectors.dense(2.0,2.0,10.0), "foo", 4, [0, 1])
], ["features","values","labels","conditioner"])
display(cnnm.transform(test_df))
This code should log one run with a ConditionalKNNModel artifact and its parameters.