physical-ai-toolchain

LeRobot behavioral cloning training for ACT and Diffusion policy architectures. Training runs on Azure ML and OSMO platforms using HuggingFace Hub datasets with WANDB and MLflow experiment tracking.

📋 Prerequisites

Component	Requirement
Infrastructure	AKS cluster deployed via Infrastructure Guide
Azure ML or OSMO	At least one platform configured (see Platform Selection section)
HuggingFace token	Required for private datasets (`hf_token` credential)
WANDB API key	Required when `--wandb-enable` is set (default on OSMO)

🚀 Quick Start

Azure ML

./scripts/submit-azureml-lerobot-training.sh \
  -d lerobot/aloha_sim_insertion_human

OSMO

./scripts/submit-osmo-lerobot-training.sh \
  -d lerobot/aloha_sim_insertion_human

End-to-End Pipeline (OSMO)

Train, evaluate, and register in one command:

./scripts/run-lerobot-pipeline.sh \
  -d lerobot/aloha_sim_insertion_human \
  --policy-repo-id user/my-act-policy \
  -r my-act-model

🧠 Policy Architectures

Architecture	Type	Strengths
ACT	Action Chunking with Transformers	Multi-step prediction, temporal coherence
Diffusion	Denoising Diffusion Policy	Multi-modal action distributions

Select the architecture with --policy-type:

# ACT policy (default)
./scripts/submit-osmo-lerobot-training.sh -d user/dataset -p act

# Diffusion policy
./scripts/submit-osmo-lerobot-training.sh -d user/dataset -p diffusion

⚖️ Platform Selection

Aspect	Azure ML	OSMO
Submission	`az ml job create`	`osmo workflow submit`
Experiment tracking	MLflow (managed)	WANDB (default) + MLflow (optional)
Credential handling	Azure ML environment variables	`osmo credential set` injection
Dataset delivery	HuggingFace Hub download	Hub download or OSMO bucket mount
Pipeline support	Manual multi-step	`run-lerobot-pipeline.sh` orchestration

⚙️ Training Configuration

Parameter	Default	Description
`--dataset-repo-id`	(required)	HuggingFace dataset repository
`--policy-type`	`act`	Policy: `act` or `diffusion`
`--job-name`	`lerobot-act-training`	Job identifier
`--image`	`pytorch/pytorch:2.4.1-cuda12.4-cudnn9-runtime`	Container image
`--training-steps`	(LeRobot default)	Total training iterations
`--batch-size`	(LeRobot default)	Training batch size
`--save-freq`	`5000`	Checkpoint save frequency
`--policy-repo-id`	(none)	Pre-trained policy for fine-tuning

Fine-Tuning from Existing Policy

./scripts/submit-osmo-lerobot-training.sh \
  -d user/my-dataset \
  --policy-repo-id user/pretrained-act \
  --training-steps 50000 \
  --batch-size 16

🔑 Credential Setup

OSMO Credentials

OSMO injects credentials at workflow runtime:

# HuggingFace token (required for private datasets)
osmo credential set hf_token --generic --value "hf_..."

# WANDB API key (required when wandb_enable=true)
osmo credential set wandb_api_key --generic --value "..."

Azure ML Credentials

Azure ML uses workspace-managed identity. Set environment variables for custom configurations:

Variable	Description
`AZURE_SUBSCRIPTION_ID`	Azure subscription ID
`AZURE_RESOURCE_GROUP`	Resource group name
`AZUREML_WORKSPACE_NAME`	Azure ML workspace name
`AZUREML_COMPUTE`	Compute target name

📊 Experiment Logging

WANDB (Default on OSMO)

WANDB logging is enabled by default on OSMO workflows. Requires wandb_api_key credential.

# Disable WANDB
./scripts/submit-osmo-lerobot-training.sh \
  -d user/dataset \
  --wandb-disable

MLflow (Azure ML Managed)

Azure ML training uses MLflow automatically. Enable MLflow on OSMO with:

./scripts/submit-osmo-lerobot-training.sh \
  -d user/dataset \
  --mlflow-enable

See Experiment Tracking for platform comparison and configuration details.

💾 Dataset Workflows

HuggingFace Hub (Default)

LeRobot downloads datasets from HuggingFace Hub at runtime. Specify datasets with --dataset-repo-id:

./scripts/submit-osmo-lerobot-training.sh \
  -d lerobot/aloha_sim_insertion_human

OSMO Dataset Mount

Mount datasets from OSMO buckets backed by Azure Blob Storage:

./scripts/submit-osmo-lerobot-training.sh \
  -w workflows/osmo/lerobot-train-dataset.yaml \
  -d user/fallback-dataset \
  --dataset-bucket my-bucket \
  --dataset-name my-lerobot-data

Falls back to HuggingFace Hub download when no dataset mount is available.

🔄 End-to-End Pipeline

The run-lerobot-pipeline.sh script orchestrates the full lifecycle on OSMO:

Stage	Action
1	Submit training workflow
2	Poll workflow status until completion
3	Submit inference/evaluation workflow

# Full pipeline
./scripts/run-lerobot-pipeline.sh \
  -d lerobot/aloha_sim_insertion_human \
  --policy-repo-id user/my-policy \
  -r my-model

# Training only with polling (skip inference)
./scripts/run-lerobot-pipeline.sh \
  -d user/dataset \
  --skip-inference

# Async mode (submit and exit)
./scripts/run-lerobot-pipeline.sh \
  -d user/dataset \
  --skip-wait

LeRobot Inference for evaluating trained policies
Experiment Tracking for MLflow and WANDB configuration
AzureML Workflows for job template reference
OSMO Workflows for workflow template reference
Scripts Reference for full CLI parameter tables

🤖 Crafted with precision by ✨Copilot following brilliant human instruction, then carefully refined by our team of discerning human reviewers.

This site is open source. Improve this page.