Skip to main content

Azure ML Training Workflows

Submit Isaac Lab reinforcement learning and LeRobot behavioral cloning training jobs to Azure Machine Learning using Kubernetes compute targets.

📋 Prerequisites

ComponentRequirement
AzureML extensionDeployed via 02-deploy-azureml-extension.sh
Kubernetes computeGPU-capable compute target attached to AzureML workspace
Azure subscriptionSubscription ID, resource group, and workspace name configured

📦 Available Templates

TemplatePurposeSubmission Script
train.yamlIsaac Lab SKRL trainingscripts/submit-azureml-training.sh
validate.yamlIsaac Lab validationscripts/submit-azureml-validation.sh
lerobot-train.yamlLeRobot behavioral cloningscripts/submit-azureml-lerobot-training.sh

⚙️ Isaac Lab Training Parameters

ParameterDescription
modeTrain or retrain (default: train)
checkpoint_modeCheckpoint strategy: from-scratch, from-trained
taskIsaac Lab task name (e.g., Isaac-Cartpole-v0)
num_envsNumber of parallel environments
headlessRun without rendering (default: true)
max_iterationsMaximum training iterations

🤖 LeRobot Training Parameters

ParameterDefaultDescription
dataset_repo_id(required)HuggingFace dataset repository
policy_typeactPolicy architecture: act, diffusion
job_namelerobot-act-trainingUnique job identifier
imagepytorch/pytorch:2.4.1-cuda12.4-cudnn9-runtimeContainer image
wandb_enabletrueEnable WANDB logging
save_freq5000Checkpoint save frequency

🔧 Environment Variables

VariableDescription
AZURE_SUBSCRIPTION_IDAzure subscription ID
AZURE_RESOURCE_GROUPResource group name
AZUREML_WORKSPACE_NAMEAzure ML workspace name
AZUREML_COMPUTEKubernetes compute target name

Scripts auto-detect these values from Terraform outputs. Override using CLI arguments or environment variables.

🚀 Quick Start

Isaac Lab SKRL training:

# Default configuration from Terraform outputs
./scripts/submit-azureml-training.sh

# Custom task and environment count
./scripts/submit-azureml-training.sh \
--task Isaac-Cartpole-v0 \
--num-envs 512 \
--max-iterations 1000

Isaac Lab validation:

./scripts/submit-azureml-validation.sh \
--task Isaac-Cartpole-v0 \
--checkpoint-mode from-trained

LeRobot training:

./scripts/submit-azureml-lerobot-training.sh \
--dataset-repo-id lerobot/aloha_sim_insertion_human \
--policy-type act

💾 Checkpoint Management

ModeBehavior
from-scratchStart training from random initialization
from-trainedResume from an existing checkpoint

Specify the checkpoint mode with --checkpoint-mode:

./scripts/submit-azureml-training.sh \
--checkpoint-mode from-trained \
--task Isaac-Cartpole-v0

🤖 Crafted with precision by ✨Copilot following brilliant human instruction, then carefully refined by our team of discerning human reviewers.