Experiment Tracking

Experiment tracking for Isaac Lab and LeRobot training workflows. Azure ML provides managed MLflow tracking on both platforms (Azure ML directly, OSMO via the Azure ML backend).

📊 MLflow Tracking

Azure ML manages MLflow as the default experiment tracking backend. Isaac Lab training with SKRL logs metrics automatically through monkey-patching.

Isaac Lab (Automatic)

SKRL training logs metrics to MLflow without additional configuration. Metrics include episode rewards, training losses, optimization stats, and timing data.

Configure logging frequency with --mlflow_log_interval:

Interval	Behavior	Use Case
`step`	Log every training step	Debugging
`balanced`	Log every 10 steps (default)	Standard training
`rollout`	Log once per rollout cycle	Long runs
Integer	Custom step interval	Tuned granularity

See MLflow Integration for SKRL metric categories, filtering, and troubleshooting.

LeRobot

MLflow is enabled automatically for LeRobot training on both OSMO and Azure ML. Submit an OSMO training job:

training/il/scripts/submit-osmo-lerobot-training.sh \
  -d user/dataset

MLflow Configuration

Parameter	Default	Description	Source
`--mlflow-token-retries`	`3`	MLflow token refresh retry count	`MLFLOW_TRACKING_TOKEN_REFRESH_RETRIES`
`--mlflow-http-timeout`	`60`	MLflow HTTP request timeout (sec)	`MLFLOW_HTTP_REQUEST_TIMEOUT`

Model Registration

Training scripts register model checkpoints to Azure ML automatically at completion.

Registration Parameters

Parameter	Default	Description
`--register-checkpoint`	Derived from task	Model name for registration
`--skip-register-checkpoint`	`false`	Skip automatic registration
`--register-model`	(none)	Model name (LeRobot inference)

Registration Examples

# Isaac Lab: custom model name
training/rl/scripts/submit-azureml-training.sh \
  --register-checkpoint my-anymal-model

# Isaac Lab: skip registration
training/rl/scripts/submit-osmo-training.sh \
  --skip-register-checkpoint

# LeRobot: register after evaluation
evaluation/sil/scripts/submit-osmo-lerobot-eval.sh \
  --policy-repo-id user/trained-policy \
  -r my-evaluated-model

Retrieve Registered Models

# Download from Azure ML
az ml model download \
  --name anymal-c-velocity --version 1 \
  --download-path ./checkpoint

# Download from HuggingFace Hub
huggingface-cli download user/trained-policy --local-dir ./checkpoint

🔄 Checkpoint Workflows

Training supports three checkpoint initialization modes:

Mode	Weights	Optimizer	Counters	Use Case
`from-scratch`	Random	Fresh	Reset	Initial training
`warm-start`	Loaded	Fresh	Reset	Transfer learning
`resume`	Loaded	Loaded	Loaded	Continue interrupted training

# Resume training from MLflow artifact
training/rl/scripts/submit-azureml-training.sh \
  --checkpoint-uri "runs:/abc123/checkpoint" \
  --checkpoint-mode resume

# Warm-start from registered model
training/rl/scripts/submit-osmo-training.sh \
  --checkpoint-uri "models:/anymal-c-velocity/1" \
  --checkpoint-mode warm-start

MLflow Integration for SKRL metric logging internals
Isaac Lab Training for RL training workflows
LeRobot Training for behavioral cloning workflows
Scripts Reference for full CLI parameter tables

🤖 Crafted with precision by ✨Copilot following brilliant human instruction, then carefully refined by our team of discerning human reviewers.

📊 MLflow Tracking​

Isaac Lab (Automatic)​

LeRobot​

MLflow Configuration​

Model Registration​

Registration Parameters​

Registration Examples​

Retrieve Registered Models​

🔄 Checkpoint Workflows​

🔗 Related Documentation​