Detailed submission examples for training, inference, and pipeline workflows on OSMO and Azure ML platforms.
[!NOTE] For CLI argument reference and script inventory, see Script Reference.
The submit-osmo-dataset-training.sh script uploads training/rl/ as a versioned OSMO dataset. This approach removes the ~1MB size limit of base64-encoded archives and enables dataset reuse across runs.
# Default dataset configuration
./submit-osmo-dataset-training.sh --task Isaac-Velocity-Rough-Anymal-C-v0
# Custom dataset bucket and name
./submit-osmo-dataset-training.sh \
--dataset-bucket custom-bucket \
--dataset-name my-training-v1 \
--task Isaac-Velocity-Rough-Anymal-C-v0
# With checkpoint resume
./submit-osmo-dataset-training.sh \
--task Isaac-Velocity-Rough-Anymal-C-v0 \
--checkpoint-uri "runs:/abc123/checkpoint" \
--checkpoint-mode resume
| Parameter | Default | Description |
|---|---|---|
--dataset-bucket |
training |
OSMO bucket for training code |
--dataset-name |
training-code |
Dataset name (auto-versioned) |
--training-path |
training/rl |
Local folder to upload |
The script stages files to exclude __pycache__ and build artifacts via .amlignore patterns before upload.
The submit-osmo-lerobot-training.sh script submits LeRobot training workflows supporting ACT and Diffusion policy architectures. Uses HuggingFace Hub datasets and installs runtime dependencies from training/il/lerobot/pyproject.toml.
# ACT policy with WANDB logging
./submit-osmo-lerobot-training.sh -d user/my-dataset
# Diffusion policy with Azure MLflow
./submit-osmo-lerobot-training.sh \
-d user/my-dataset \
-p diffusion \
--mlflow-enable \
-r my-model-name
# Fine-tune from pre-trained policy
./submit-osmo-lerobot-training.sh \
-d user/my-dataset \
--policy-repo-id user/pretrained-act \
--training-steps 50000 \
--batch-size 16
| Parameter | Default | Description |
|---|---|---|
--dataset-repo-id |
(required) | HuggingFace dataset repository ID |
--policy-type |
act |
Policy: act, diffusion |
--job-name |
lerobot-act-training |
Job identifier |
--wandb-enable |
enabled | WANDB logging (default) |
--mlflow-enable |
disabled | Azure ML MLflow logging |
--policy-repo-id |
(none) | Pre-trained policy for fine-tuning |
--training-steps |
(LeRobot default) | Total training iterations |
--save-freq |
5000 |
Checkpoint save frequency |
The submit-osmo-lerobot-inference.sh script evaluates trained LeRobot policies from HuggingFace Hub. Downloads the policy, runs evaluation, and optionally registers the model to Azure ML.
# Evaluate a trained policy
./submit-osmo-lerobot-inference.sh --policy-repo-id user/trained-act-policy
# Evaluate with model registration
./submit-osmo-lerobot-inference.sh \
--policy-repo-id user/trained-act-policy \
-r my-evaluated-model
# Diffusion policy evaluation
./submit-osmo-lerobot-inference.sh \
--policy-repo-id user/trained-diffusion \
-p diffusion \
--eval-episodes 50
| Parameter | Default | Description |
|---|---|---|
--policy-repo-id |
(required) | HuggingFace policy repository |
--policy-type |
act |
Policy: act, diffusion |
--eval-episodes |
10 |
Number of evaluation episodes |
--register-model |
(none) | Model name for Azure ML registration |
--dataset-repo-id |
(none) | Dataset for environment replay |
The submit-azureml-lerobot-training.sh script submits LeRobot training directly to Azure ML instead of OSMO. It registers an environment, compiles runtime dependencies from training/il/lerobot/pyproject.toml, and submits via az ml job create.
# ACT policy training
./submit-azureml-lerobot-training.sh -d user/my-dataset
# With model registration and log streaming
./submit-azureml-lerobot-training.sh \
-d user/my-dataset \
-r my-act-model \
--stream
# Custom environment and compute
./submit-azureml-lerobot-training.sh \
-d user/my-dataset \
--image custom-registry.io/lerobot:latest \
--compute my-gpu-cluster
The run-lerobot-pipeline.sh script orchestrates the full LeRobot lifecycle: training → polling → inference → model registration. It delegates to the individual submission scripts and polls OSMO workflow status between stages.
| Stage | Action | Script Used |
|---|---|---|
| 1 | Submit training workflow | submit-osmo-lerobot-training.sh |
| 2 | Poll workflow status until completion | osmo workflow status |
| 3 | Submit inference/evaluation workflow | submit-osmo-lerobot-inference.sh |
# Full pipeline: train → evaluate → register
./run-lerobot-pipeline.sh \
-d lerobot/aloha_sim_insertion_human \
--policy-repo-id user/my-act-policy \
-r my-act-model
# Async mode (submit training and exit)
./run-lerobot-pipeline.sh \
-d user/my-dataset \
--skip-wait
# Diffusion pipeline with MLflow
./run-lerobot-pipeline.sh \
-d user/my-dataset \
--policy-repo-id user/my-diffusion \
-p diffusion \
--mlflow-enable \
--training-steps 100000 \
-r my-diffusion-model
# Skip inference (training only with polling)
./run-lerobot-pipeline.sh \
-d user/my-dataset \
--skip-inference
| Parameter | Default | Description |
|---|---|---|
--dataset-repo-id |
(required) | HuggingFace dataset repository |
--policy-repo-id |
(required*) | HuggingFace policy target repo |
--policy-type |
act |
Policy: act, diffusion |
--register-model |
(none) | Azure ML model registration name |
--poll-interval |
60 |
Status check interval (seconds) |
--timeout |
720 |
Training timeout (minutes) |
--skip-wait |
disabled | Async mode: submit and exit |
--skip-inference |
disabled | Skip inference stage |
🤖 Crafted with precision by ✨Copilot following brilliant human instruction, then carefully refined by our team of discerning human reviewers.