Inventory of submission scripts for training, validation, and inference workflows on Azure ML and OSMO platforms. Each entry includes CLI arguments, environment variable overrides, and Terraform output resolution.
[!NOTE] For detailed submission examples, see Script Examples.
| Script | Purpose | Platform |
|---|---|---|
submit-azureml-training.sh |
Package code and submit Azure ML training job | Azure ML |
submit-azureml-validation.sh |
Submit model validation job | Azure ML |
submit-azureml-lerobot-training.sh |
Submit LeRobot training to Azure ML | Azure ML |
submit-osmo-training.sh |
Package code and submit OSMO workflow (base64) | OSMO |
submit-osmo-dataset-training.sh |
Submit OSMO workflow using dataset folder injection | OSMO |
submit-osmo-lerobot-training.sh |
Submit LeRobot behavioral cloning training | OSMO |
submit-osmo-lerobot-inference.sh |
Submit LeRobot inference/evaluation | OSMO |
run-lerobot-pipeline.sh |
End-to-end train → evaluate → register pipeline | OSMO |
Scripts auto-detect Azure context from Terraform outputs in infrastructure/terraform/:
# Azure ML training
./submit-azureml-training.sh --task Isaac-Velocity-Rough-Anymal-C-v0
# OSMO training (base64 encoded)
./submit-osmo-training.sh --task Isaac-Velocity-Rough-Anymal-C-v0
# OSMO training (dataset folder upload)
./submit-osmo-dataset-training.sh --task Isaac-Velocity-Rough-Anymal-C-v0
# LeRobot behavioral cloning (OSMO)
./submit-osmo-lerobot-training.sh -d lerobot/aloha_sim_insertion_human
# LeRobot behavioral cloning (Azure ML)
./submit-azureml-lerobot-training.sh -d lerobot/aloha_sim_insertion_human
# LeRobot inference/evaluation
./submit-osmo-lerobot-inference.sh --policy-repo-id user/trained-policy
# End-to-end pipeline: train → evaluate → register
./run-lerobot-pipeline.sh \
-d lerobot/aloha_sim_insertion_human \
--policy-repo-id user/my-policy \
-r my-model
# Validation (requires registered model)
./submit-azureml-validation.sh --model-name anymal-c-velocity --model-version 1
Common requirements:
infrastructure/terraform/ (or provide the same values via CLI / environment variables)Script-specific tools:
az CLI + az extension add --name mljqosmozip, base64rsyncValues resolve in order: CLI arguments → environment variables → Terraform outputs (when applicable).
submit-azureml-training.sh| Option | Default | Description | Source |
|---|---|---|---|
--environment-name |
isaaclab-training-env |
AzureML environment name | CLI |
--environment-version |
2.3.2 |
AzureML environment version | CLI |
--image / -i |
nvcr.io/nvidia/isaac-lab:2.3.2 |
Container image | CLI |
--assets-only |
false |
Register environment without submitting a job | CLI |
--job-file / -w |
workflows/azureml/train.yaml |
Job YAML template | CLI |
--task / -t |
Isaac-Velocity-Rough-Anymal-C-v0 |
IsaacLab task | TASK |
--num-envs / -n |
2048 |
Number of parallel environments | NUM_ENVS |
--max-iterations / -m |
unset | Max iterations (empty to unset) | MAX_ITERATIONS |
--checkpoint-uri / -c |
unset | MLflow checkpoint artifact URI | CHECKPOINT_URI |
--checkpoint-mode / -M |
from-scratch |
from-scratch, warm-start, resume, fresh |
CHECKPOINT_MODE |
--register-checkpoint / -r |
derived from task | Model name for checkpoint registration | REGISTER_CHECKPOINT |
--skip-register-checkpoint |
false |
Skip automatic model registration | CLI |
--headless |
true |
Force headless rendering | CLI |
--gui / --no-headless |
false |
Disable headless mode | CLI |
--run-smoke-test / -s |
false |
Run Azure connectivity smoke test before submit | RUN_AZURE_SMOKE_TEST |
--mode |
train |
Execution mode | CLI |
--subscription-id |
from TF | Azure subscription ID | AZURE_SUBSCRIPTION_ID / TF |
--resource-group |
from TF | Azure resource group | AZURE_RESOURCE_GROUP / TF |
--workspace-name |
from TF | Azure ML workspace | AZUREML_WORKSPACE_NAME / TF |
--compute |
from TF | Compute target override | AZUREML_COMPUTE / TF |
--instance-type |
gpuspot |
Instance type | CLI |
--experiment-name |
unset | Experiment name override | CLI |
--job-name |
unset | Job name override | CLI |
--display-name |
unset | Display name override | CLI |
--stream |
false |
Stream logs after submission | CLI |
--mlflow-token-retries |
3 |
MLflow token refresh retries | MLFLOW_TRACKING_TOKEN_REFRESH_RETRIES |
--mlflow-http-timeout |
60 |
MLflow HTTP request timeout (seconds) | MLFLOW_HTTP_REQUEST_TIMEOUT |
-- |
n/a | Forward remaining args to az ml job create |
CLI |
Example:
./submit-azureml-training.sh \
--task Isaac-Velocity-Rough-Anymal-C-v0 \
--num-envs 1024 \
--stream
submit-azureml-validation.sh| Option | Default | Description | Source |
|---|---|---|---|
--model-name |
derived from task | Azure ML model name | CLI |
--model-version |
latest |
Azure ML model version | CLI |
--environment-name |
isaaclab-training-env |
AzureML environment name | CLI |
--environment-version |
2.3.2 |
AzureML environment version | CLI |
--image |
nvcr.io/nvidia/isaac-lab:2.3.2 |
Container image | CLI |
--task |
Isaac-Velocity-Rough-Anymal-C-v0 |
Override task ID | TASK |
--framework |
unset | Override framework | CLI |
--eval-episodes |
100 |
Evaluation episodes | CLI |
--num-envs |
64 |
Parallel environments | CLI |
--success-threshold |
unset | Success threshold (defaults from model metadata) | CLI |
--headless |
true |
Run headless | CLI |
--gui |
false |
Disable headless mode | CLI |
--job-file |
workflows/azureml/validate.yaml |
Job YAML template | CLI |
--compute |
from TF | Compute target override | AZUREML_COMPUTE / TF |
--instance-type |
gpuspot |
Instance type | CLI |
--experiment-name |
unset | Experiment name override | CLI |
--job-name |
unset | Job name override | CLI |
--stream |
false |
Stream logs after submission | CLI |
--subscription-id |
from TF | Azure subscription ID | AZURE_SUBSCRIPTION_ID / TF |
--resource-group |
from TF | Azure resource group | AZURE_RESOURCE_GROUP / TF |
--workspace-name |
from TF | Azure ML workspace | AZUREML_WORKSPACE_NAME / TF |
Example:
./submit-azureml-validation.sh \
--model-name anymal-c-velocity \
--model-version 1 \
--stream
submit-osmo-training.sh (base64 payload)| Option | Default | Description | Source |
|---|---|---|---|
--workflow / -w |
workflows/osmo/train.yaml |
Workflow template | CLI |
--task / -t |
Isaac-Velocity-Rough-Anymal-C-v0 |
IsaacLab task | TASK |
--num-envs / -n |
2048 |
Number of parallel environments | NUM_ENVS |
--max-iterations / -m |
unset | Max iterations (empty to unset) | MAX_ITERATIONS |
--image / -i |
nvcr.io/nvidia/isaac-lab:2.3.2 |
Container image | IMAGE |
--payload-root / -p |
/workspace/isaac_payload |
Runtime extraction root | PAYLOAD_ROOT |
--backend / -b |
skrl |
Training backend: skrl (default), rsl_rl |
TRAINING_BACKEND |
--checkpoint-uri / -c |
unset | MLflow checkpoint artifact URI | CHECKPOINT_URI |
--checkpoint-mode / -M |
from-scratch |
from-scratch, warm-start, resume, fresh |
CHECKPOINT_MODE |
--register-checkpoint / -r |
derived from task | Model name for checkpoint registration | REGISTER_CHECKPOINT |
--skip-register-checkpoint |
false |
Skip automatic model registration | CLI |
--sleep-after-unpack |
unset | Sleep seconds post-unpack (debug) | SLEEP_AFTER_UNPACK |
--run-smoke-test / -s |
false |
Enable Azure connectivity smoke test | RUN_AZURE_SMOKE_TEST |
--azure-subscription-id |
from TF | Azure subscription ID | AZURE_SUBSCRIPTION_ID / TF |
--azure-resource-group |
from TF | Azure resource group | AZURE_RESOURCE_GROUP / TF |
--azure-workspace-name |
from TF | Azure ML workspace | AZUREML_WORKSPACE_NAME / TF |
-- |
n/a | Forward remaining args to osmo workflow submit |
CLI |
Example:
./submit-osmo-training.sh \
--task Isaac-Velocity-Rough-Anymal-C-v0 \
--backend skrl \
-- --dry-run
submit-osmo-dataset-training.sh (dataset injection)| Option | Default | Description | Source |
|---|---|---|---|
--workflow / -w |
workflows/osmo/train-dataset.yaml |
Workflow template | CLI |
--task / -t |
Isaac-Velocity-Rough-Anymal-C-v0 |
IsaacLab task | TASK |
--num-envs / -n |
2048 |
Number of parallel environments | NUM_ENVS |
--max-iterations / -m |
unset | Max iterations (empty to unset) | MAX_ITERATIONS |
--image / -i |
nvcr.io/nvidia/isaac-lab:2.3.2 |
Container image | IMAGE |
--backend / -b |
skrl |
Training backend: skrl (default), rsl_rl |
TRAINING_BACKEND |
--dataset-bucket |
training |
OSMO bucket name | OSMO_DATASET_BUCKET |
--dataset-name |
training-code |
Dataset name (auto-versioned) | OSMO_DATASET_NAME |
--training-path |
training/ |
Local path to upload | TRAINING_PATH |
--checkpoint-uri / -c |
unset | MLflow checkpoint artifact URI | CHECKPOINT_URI |
--checkpoint-mode / -M |
from-scratch |
from-scratch, warm-start, resume, fresh |
CHECKPOINT_MODE |
--register-checkpoint / -r |
derived from task | Model name for checkpoint registration | REGISTER_CHECKPOINT |
--skip-register-checkpoint |
false |
Skip automatic model registration | CLI |
--run-smoke-test / -s |
false |
Enable Azure connectivity smoke test | RUN_AZURE_SMOKE_TEST |
--azure-subscription-id |
from TF | Azure subscription ID | AZURE_SUBSCRIPTION_ID / TF |
--azure-resource-group |
from TF | Azure resource group | AZURE_RESOURCE_GROUP / TF |
--azure-workspace-name |
from TF | Azure ML workspace | AZUREML_WORKSPACE_NAME / TF |
-- |
n/a | Forward remaining args to osmo workflow submit |
CLI |
Example:
./submit-osmo-dataset-training.sh \
--task Isaac-Velocity-Rough-Anymal-C-v0 \
--dataset-name my-training-v1
Scripts resolve values in order: CLI arguments → environment variables → Terraform outputs.
| Variable | Description |
|---|---|
AZURE_SUBSCRIPTION_ID |
Azure subscription |
AZURE_RESOURCE_GROUP |
Resource group name |
AZUREML_WORKSPACE_NAME |
ML workspace name |
TASK |
IsaacLab task name |
NUM_ENVS |
Number of parallel environments |
OSMO_DATASET_BUCKET |
Dataset bucket for OSMO training |
OSMO_DATASET_NAME |
Dataset name for OSMO training |
DATASET_REPO_ID |
HuggingFace dataset repo ID |
POLICY_TYPE |
LeRobot policy architecture |
| File | Purpose |
|---|---|
lib/terraform-outputs.sh |
Shared functions for reading Terraform outputs |
Source the library to use helper functions:
source lib/terraform-outputs.sh
read_terraform_outputs ../infrastructure/terraform
get_aks_cluster_name # Returns AKS cluster name
get_azureml_workspace # Returns ML workspace name
🤖 Crafted with precision by ✨Copilot following brilliant human instruction, then carefully refined by our team of discerning human reviewers.