Skip to main content

Script Reference

Inventory of submission scripts for training, validation, and inference workflows on Azure ML and OSMO platforms. Each entry includes CLI arguments, environment variable overrides, and Terraform output resolution.

[!NOTE] For detailed submission examples, see Script Examples.

Submission Scripts

ScriptPurposePlatform
submit-azureml-training.shPackage code and submit Azure ML training jobAzure ML
submit-azureml-validation.shSubmit model validation jobAzure ML
submit-azureml-lerobot-training.shSubmit LeRobot training to Azure MLAzure ML
submit-osmo-training.shPackage code and submit OSMO workflow (base64)OSMO
submit-osmo-dataset-training.shSubmit OSMO workflow using dataset folder injectionOSMO
submit-osmo-lerobot-training.shSubmit LeRobot behavioral cloning trainingOSMO
submit-osmo-lerobot-inference.shSubmit LeRobot inference/evaluationOSMO
run-lerobot-pipeline.shEnd-to-end train → evaluate → register pipelineOSMO

Quick Start

Scripts auto-detect Azure context from Terraform outputs in infrastructure/terraform/:

# Azure ML training
./submit-azureml-training.sh --task Isaac-Velocity-Rough-Anymal-C-v0

# OSMO training (base64 encoded)
./submit-osmo-training.sh --task Isaac-Velocity-Rough-Anymal-C-v0

# OSMO training (dataset folder upload)
./submit-osmo-dataset-training.sh --task Isaac-Velocity-Rough-Anymal-C-v0

# LeRobot behavioral cloning (OSMO)
./submit-osmo-lerobot-training.sh -d lerobot/aloha_sim_insertion_human

# LeRobot behavioral cloning (Azure ML)
./submit-azureml-lerobot-training.sh -d lerobot/aloha_sim_insertion_human

# LeRobot inference/evaluation
./submit-osmo-lerobot-inference.sh --policy-repo-id user/trained-policy

# End-to-end pipeline: train → evaluate → register
./run-lerobot-pipeline.sh \
-d lerobot/aloha_sim_insertion_human \
--policy-repo-id user/my-policy \
-r my-model

# Validation (requires registered model)
./submit-azureml-validation.sh --model-name anymal-c-velocity --model-version 1

Prerequisites

Common requirements:

  • Bash 4+
  • Terraform outputs available in infrastructure/terraform/ (or provide the same values via CLI / environment variables)

Script-specific tools:

  • Azure ML scripts: az CLI + az extension add --name ml
  • Validation: jq
  • OSMO scripts: osmo
  • Base64 payload submission: zip, base64
  • Dataset injection submission: rsync

CLI Arguments

Values resolve in order: CLI arguments → environment variables → Terraform outputs (when applicable).

submit-azureml-training.sh

OptionDefaultDescriptionSource
--environment-nameisaaclab-training-envAzureML environment nameCLI
--environment-version2.3.2AzureML environment versionCLI
--image / -invcr.io/nvidia/isaac-lab:2.3.2Container imageCLI
--assets-onlyfalseRegister environment without submitting a jobCLI
--job-file / -wworkflows/azureml/train.yamlJob YAML templateCLI
--task / -tIsaac-Velocity-Rough-Anymal-C-v0IsaacLab taskTASK
--num-envs / -n2048Number of parallel environmentsNUM_ENVS
--max-iterations / -munsetMax iterations (empty to unset)MAX_ITERATIONS
--checkpoint-uri / -cunsetMLflow checkpoint artifact URICHECKPOINT_URI
--checkpoint-mode / -Mfrom-scratchfrom-scratch, warm-start, resume, freshCHECKPOINT_MODE
--register-checkpoint / -rderived from taskModel name for checkpoint registrationREGISTER_CHECKPOINT
--skip-register-checkpointfalseSkip automatic model registrationCLI
--headlesstrueForce headless renderingCLI
--gui / --no-headlessfalseDisable headless modeCLI
--run-smoke-test / -sfalseRun Azure connectivity smoke test before submitRUN_AZURE_SMOKE_TEST
--modetrainExecution modeCLI
--subscription-idfrom TFAzure subscription IDAZURE_SUBSCRIPTION_ID / TF
--resource-groupfrom TFAzure resource groupAZURE_RESOURCE_GROUP / TF
--workspace-namefrom TFAzure ML workspaceAZUREML_WORKSPACE_NAME / TF
--computefrom TFCompute target overrideAZUREML_COMPUTE / TF
--instance-typegpuspotInstance typeCLI
--experiment-nameunsetExperiment name overrideCLI
--job-nameunsetJob name overrideCLI
--display-nameunsetDisplay name overrideCLI
--streamfalseStream logs after submissionCLI
--mlflow-token-retries3MLflow token refresh retriesMLFLOW_TRACKING_TOKEN_REFRESH_RETRIES
--mlflow-http-timeout60MLflow HTTP request timeout (seconds)MLFLOW_HTTP_REQUEST_TIMEOUT
--n/aForward remaining args to az ml job createCLI

Example:

./submit-azureml-training.sh \
--task Isaac-Velocity-Rough-Anymal-C-v0 \
--num-envs 1024 \
--stream

submit-azureml-validation.sh

OptionDefaultDescriptionSource
--model-namederived from taskAzure ML model nameCLI
--model-versionlatestAzure ML model versionCLI
--environment-nameisaaclab-training-envAzureML environment nameCLI
--environment-version2.3.2AzureML environment versionCLI
--imagenvcr.io/nvidia/isaac-lab:2.3.2Container imageCLI
--taskIsaac-Velocity-Rough-Anymal-C-v0Override task IDTASK
--frameworkunsetOverride frameworkCLI
--eval-episodes100Evaluation episodesCLI
--num-envs64Parallel environmentsCLI
--success-thresholdunsetSuccess threshold (defaults from model metadata)CLI
--headlesstrueRun headlessCLI
--guifalseDisable headless modeCLI
--job-fileworkflows/azureml/validate.yamlJob YAML templateCLI
--computefrom TFCompute target overrideAZUREML_COMPUTE / TF
--instance-typegpuspotInstance typeCLI
--experiment-nameunsetExperiment name overrideCLI
--job-nameunsetJob name overrideCLI
--streamfalseStream logs after submissionCLI
--subscription-idfrom TFAzure subscription IDAZURE_SUBSCRIPTION_ID / TF
--resource-groupfrom TFAzure resource groupAZURE_RESOURCE_GROUP / TF
--workspace-namefrom TFAzure ML workspaceAZUREML_WORKSPACE_NAME / TF

Example:

./submit-azureml-validation.sh \
--model-name anymal-c-velocity \
--model-version 1 \
--stream

submit-osmo-training.sh (base64 payload)

OptionDefaultDescriptionSource
--workflow / -wworkflows/osmo/train.yamlWorkflow templateCLI
--task / -tIsaac-Velocity-Rough-Anymal-C-v0IsaacLab taskTASK
--num-envs / -n2048Number of parallel environmentsNUM_ENVS
--max-iterations / -munsetMax iterations (empty to unset)MAX_ITERATIONS
--image / -invcr.io/nvidia/isaac-lab:2.3.2Container imageIMAGE
--payload-root / -p/workspace/isaac_payloadRuntime extraction rootPAYLOAD_ROOT
--backend / -bskrlTraining backend: skrl (default), rsl_rlTRAINING_BACKEND
--checkpoint-uri / -cunsetMLflow checkpoint artifact URICHECKPOINT_URI
--checkpoint-mode / -Mfrom-scratchfrom-scratch, warm-start, resume, freshCHECKPOINT_MODE
--register-checkpoint / -rderived from taskModel name for checkpoint registrationREGISTER_CHECKPOINT
--skip-register-checkpointfalseSkip automatic model registrationCLI
--sleep-after-unpackunsetSleep seconds post-unpack (debug)SLEEP_AFTER_UNPACK
--run-smoke-test / -sfalseEnable Azure connectivity smoke testRUN_AZURE_SMOKE_TEST
--azure-subscription-idfrom TFAzure subscription IDAZURE_SUBSCRIPTION_ID / TF
--azure-resource-groupfrom TFAzure resource groupAZURE_RESOURCE_GROUP / TF
--azure-workspace-namefrom TFAzure ML workspaceAZUREML_WORKSPACE_NAME / TF
--n/aForward remaining args to osmo workflow submitCLI

Example:

./submit-osmo-training.sh \
--task Isaac-Velocity-Rough-Anymal-C-v0 \
--backend skrl \
-- --dry-run

submit-osmo-dataset-training.sh (dataset injection)

OptionDefaultDescriptionSource
--workflow / -wworkflows/osmo/train-dataset.yamlWorkflow templateCLI
--task / -tIsaac-Velocity-Rough-Anymal-C-v0IsaacLab taskTASK
--num-envs / -n2048Number of parallel environmentsNUM_ENVS
--max-iterations / -munsetMax iterations (empty to unset)MAX_ITERATIONS
--image / -invcr.io/nvidia/isaac-lab:2.3.2Container imageIMAGE
--backend / -bskrlTraining backend: skrl (default), rsl_rlTRAINING_BACKEND
--dataset-buckettrainingOSMO bucket nameOSMO_DATASET_BUCKET
--dataset-nametraining-codeDataset name (auto-versioned)OSMO_DATASET_NAME
--training-pathtraining/Local path to uploadTRAINING_PATH
--checkpoint-uri / -cunsetMLflow checkpoint artifact URICHECKPOINT_URI
--checkpoint-mode / -Mfrom-scratchfrom-scratch, warm-start, resume, freshCHECKPOINT_MODE
--register-checkpoint / -rderived from taskModel name for checkpoint registrationREGISTER_CHECKPOINT
--skip-register-checkpointfalseSkip automatic model registrationCLI
--run-smoke-test / -sfalseEnable Azure connectivity smoke testRUN_AZURE_SMOKE_TEST
--azure-subscription-idfrom TFAzure subscription IDAZURE_SUBSCRIPTION_ID / TF
--azure-resource-groupfrom TFAzure resource groupAZURE_RESOURCE_GROUP / TF
--azure-workspace-namefrom TFAzure ML workspaceAZUREML_WORKSPACE_NAME / TF
--n/aForward remaining args to osmo workflow submitCLI

Example:

./submit-osmo-dataset-training.sh \
--task Isaac-Velocity-Rough-Anymal-C-v0 \
--dataset-name my-training-v1

Configuration

Scripts resolve values in order: CLI arguments → environment variables → Terraform outputs.

VariableDescription
AZURE_SUBSCRIPTION_IDAzure subscription
AZURE_RESOURCE_GROUPResource group name
AZUREML_WORKSPACE_NAMEML workspace name
TASKIsaacLab task name
NUM_ENVSNumber of parallel environments
OSMO_DATASET_BUCKETDataset bucket for OSMO training
OSMO_DATASET_NAMEDataset name for OSMO training
DATASET_REPO_IDHuggingFace dataset repo ID
POLICY_TYPELeRobot policy architecture

Script Library

FilePurpose
scripts/lib/terraform-outputs.shShared functions for reading Terraform outputs

Source the library to use helper functions:

source "$REPO_ROOT/scripts/lib/terraform-outputs.sh"
read_terraform_outputs "$REPO_ROOT/infrastructure/terraform"
get_aks_cluster_name # Returns AKS cluster name
get_azureml_workspace # Returns ML workspace name

🤖 Crafted with precision by ✨Copilot following brilliant human instruction, then carefully refined by our team of discerning human reviewers.