Skip to main content

OSMO Inference Workflows

Run trained policy inference through NVIDIA OSMO workflows on GPU-accelerated Kubernetes clusters. OSMO supports Isaac Lab and LeRobot frameworks with configurable checkpoint sources and automated resource scheduling.

📋 Prerequisites

ToolVersionPurpose
OSMO CLILatestWorkflow submission and monitoring
Azure CLI2.65+Azure authentication
kubectl1.28+Cluster access
Helm3.14+Chart management

🚀 Quick Start

Submit an Isaac Lab inference workflow:

osmo workflow submit \
--file workflows/osmo/infer.yaml \
--set checkpoint_uri=runs:/<run-id>/model \
--set task=Isaac-Cartpole-v0

⚖️ Workflow Comparison

FeatureIsaac LabLeRobot
Config fileinfer.yamllerobot-infer.yaml
Checkpoint formatONNX, TorchScriptPyTorch (.pt)
Task specification--task (Isaac Gym env)--policy-type (model arch)
Video recording--video-length--record-video
Evaluation--num-envs, --max-steps--eval-episodes, --eval-batch-size

🔬 Isaac Lab Inference

Checkpoint URI Formats

FormatExampleUse Case
MLflow runruns:/<run-id>/modelDirect from training
MLflow modelmodels:/<name>/<version>Model registry
Azure Blobhttps://<account>.blob.core.windows.net/...External storage
HTTP(S)https://<url>/model.onnxPublic endpoints

Supported Model Formats

FormatExtensionFrameworks
ONNX.onnxIsaac Lab
TorchScript.ptIsaac Lab, LeRobot
Both.onnx + .ptFull compatibility

Isaac Lab CLI Parameters

ParameterRequiredDefaultDescription
-c, --checkpoint-uriYesCheckpoint location URI
--taskNofrom workflowIsaac Gym environment name
--formatNoonnxModel format
--num-envsNo1Parallel environments
--max-stepsNo1000Maximum simulation steps
--video-lengthNo200Video recording frames

Locating Checkpoints

osmo workflow list
osmo workflow logs <workflow-id> | grep "checkpoint"

Configuration Resolution Order

PrioritySourceExample
1 (highest)CLI arguments--set checkpoint_uri=...
2Environment variablesCHECKPOINT_URI=...
3 (lowest)Terraform outputsAuto-detected from state

Workflow Outputs

ArtifactPathDescription
policy.onnxoutputs/Exported ONNX policy
policy.ptoutputs/TorchScript policy
Metrics JSONoutputs/metrics/Evaluation results
Videosoutputs/videos/Recorded episodes

🤖 LeRobot Inference

LeRobot CLI Parameters

ParameterRequiredDefaultDescription
--policy-repo-idYesHuggingFace model repo
--policy-typeNoactPolicy architecture
--dataset-repo-idNoEvaluation dataset
--eval-episodesNo10Number of evaluation episodes
--eval-batch-sizeNo1Batch size for evaluation
--record-videoNofalseEnable video recording

Usage Examples

Basic evaluation:

osmo workflow submit \
--file workflows/osmo/lerobot-infer.yaml \
--set policy_repo_id=<hf-repo-id>

With video recording:

osmo workflow submit \
--file workflows/osmo/lerobot-infer.yaml \
--set policy_repo_id=<hf-repo-id> \
--set record_video=true \
--set eval_episodes=20

Model registration:

osmo workflow submit \
--file workflows/osmo/lerobot-infer.yaml \
--set policy_repo_id=<hf-repo-id> \
--set register_model=true

🔑 Credential Configuration

osmo credential set hf_token <token>
osmo credential set wandb_api_key <key>

📺 Monitoring

osmo workflow logs <workflow-id>
osmo workflow logs <workflow-id> --follow
osmo workflow status <workflow-id>

🤖 Crafted with precision by ✨Copilot following brilliant human instruction, then carefully refined by our team of discerning human reviewers.