Your First LeRobot Training Job

Submit a LeRobot behavioral cloning training job to OSMO using a HuggingFace or Azure Blob dataset and verify that the trained policy appears in Azure ML. By the end of this recipe, you will have a trained ACT policy for the ALOHA sim insertion task.

[!NOTE] This recipe requires deployed infrastructure with OSMO running. Complete the Quickstart first.

📋 Prerequisites

Requirement	Details
Infrastructure	Azure resources deployed via Terraform
OSMO	Control plane and backend running
VPN	Connected to private cluster (if using private AKS)
Azure CLI	Authenticated (`az login`)
Blob access	OSMO workload identity has a Blob data role when using Azure Blob datasets

🚀 Steps

Step 1: Preview the training configuration

Preview what will be submitted:

cd training/il/scripts
./submit-osmo-lerobot-training.sh \
  -d lerobot/aloha_sim_insertion_human \
  --config-preview

Review the dataset source, policy type, training hyperparameters, and Azure ML context. No job is submitted.

Step 2: Submit a training job

Submit a training run with the default ACT policy:

./submit-osmo-lerobot-training.sh \
  -d lerobot/aloha_sim_insertion_human

The script submits an OSMO workflow that pulls the dataset from HuggingFace, trains an ACT policy, and logs metrics to MLflow.

Customize training hyperparameters:

./submit-osmo-lerobot-training.sh \
  -d lerobot/aloha_sim_insertion_human \
  --policy-type act \
  --training-steps 50000 \
  --batch-size 32 \
  --save-freq 5000

Step 3: Train with data from Azure Blob Storage

Use --blob-url when your dataset is in Azure Storage instead of HuggingFace:

./submit-osmo-lerobot-training.sh \
  --blob-url https://<your-storage-account>.blob.core.windows.net/datasets/my-dataset/v1

The workflow downloads the dataset with managed identity credentials before training starts. Grant Blob data access to the OSMO workload identity before submitting private datasets.

Step 4: Monitor training progress

OSMO UI: Open the OSMO dashboard to view workflow status, pod logs, and real-time metrics. See Accessing OSMO for connection instructions (VPN or port-forward).

Azure ML Studio: Navigate to your workspace at ml.azure.com, open the Jobs section, and select the MLflow experiment. Training metrics (loss, gradient norm, learning rate) stream in real time as the job progresses.

To view results in detail:

Open ml.azure.com and select your workspace
Navigate to Jobs in the left sidebar
Find your experiment (named after the task, e.g., lerobot-act-training) and select it
Click the latest run to open the run detail page
Select the Metrics tab to view training curves (loss, gradient norm, learning rate) — use the chart controls to overlay multiple metrics or smooth noisy curves
Select the Outputs + logs tab to view stdout/stderr logs from the training container
If --register-checkpoint was used, navigate to Models in the left sidebar to confirm the registered model and its version

CLI (optional): Check workflow status via the OSMO CLI:

osmo workflow list

Step 5: Register the trained model (optional)

./submit-osmo-lerobot-training.sh \
  -d lerobot/aloha_sim_insertion_human \
  --register-checkpoint my-act-policy

The --register-checkpoint flag triggers automatic model registration after training completes.

✅ Verify

The recipe succeeded when:

OSMO training pod reached Completed status
MLflow experiment shows training loss decreasing over steps
Model artifacts exist in Azure ML (if --register-checkpoint was used)

⚙️ Configuration Reference

Parameter	Default	Description
`-d, --dataset-repo-id`	Required for HuggingFace; `dataset` for Blob sources	HuggingFace dataset repository or logical local dataset name
`--blob-url`	(none)	Direct Azure Blob dataset URL; repeatable
`--policy-type`	`act`	Policy architecture (`act` or `diffusion`)
`--training-steps`	`100000`	Total training iterations
`--batch-size`	`32`	Training batch size
`--learning-rate`	`1e-4`	Optimizer learning rate
`--save-freq`	`5000`	Checkpoint save frequency
`--register-checkpoint`	(none)	Model name for Azure ML registration
`--init-from-policy-model`	(none)	Warm-start from a registered AzureML model (`azureml:NAME:VERSION`); AzureML submission script only

See Scripts Reference for the full parameter table.

End-to-End LeRobot Pipeline — automated train → evaluate → register
Preparing Datasets for Training — dataset download and validation
Your First RL Training Job — reinforcement learning alternative
LeRobot Training Guide — detailed IL reference documentation

🤖 Crafted with precision by ✨Copilot following brilliant human instruction, then carefully refined by our team of discerning human reviewers.

📋 Prerequisites​

🚀 Steps​

Step 1: Preview the training configuration​

Step 2: Submit a training job​

Step 3: Train with data from Azure Blob Storage​

Step 4: Monitor training progress​

Step 5: Register the trained model (optional)​

✅ Verify​

⚙️ Configuration Reference​

🔗 Related Recipes​