Technical documentation for deploying, training, and operating robotics workloads on Azure with NVIDIA Isaac and OSMO. This index organizes every guide, reference, and walkthrough in the repository by topic so you can find what you need based on where you are in the workflow.
Documentation spans the full lifecycle — from provisioning Azure infrastructure with Terraform, through training reinforcement-learning policies with Isaac Lab and AzureML, to running inference on edge devices. Each section targets a specific audience and phase of the project.
| Role | Start here |
|---|---|
| First-time deployer | Getting Started (coming soon), then Deploy |
| ML / Robotics engineer | Training (coming soon) and Inference (coming soon) |
| Platform operator | Operations and Security Guide |
| Contributor | Contributing |
| Section | Description | Status |
|---|---|---|
| Getting Started | Environment setup, prerequisites, and first deployment walkthrough | Coming soon |
| Deploy | Infrastructure provisioning with Terraform, AKS cluster setup, and networking | Available |
| Training | Model training pipelines with Isaac Lab, AzureML jobs, and OSMO orchestration | Coming soon |
| Inference | Serving trained policies for real-time control on edge and cloud | Coming soon |
| Workflows | AzureML and OSMO job templates, pipeline configuration, and submission scripts | Coming soon |
| Operations | Monitoring, scaling, troubleshooting, and cost management for running clusters | Available |
| Security | Identity, networking, compliance, and hardening for production deployments | Coming soon |
| Reference | CLI parameter tables, script usage, workflow templates, and configuration reference | Available |
| Contributing | Contribution guidelines, PR process, deployment validation, and coding conventions | Available |
Standalone guides available now. These cover common tasks and will move into their respective topic sections as the documentation structure expands.
| Guide | Description |
|---|---|
| AzureML Validation Job Debugging | Diagnosing and resolving AzureML validation job failures on AKS, including pod scheduling and resource quota issues |
| LeRobot Inference | Running LeRobot inference workloads with pre-trained policies on Azure infrastructure |
| MLflow Integration | Configuring MLflow experiment tracking for SKRL training agents with automatic metric logging to Azure ML |
| Security Guide | Security configuration inventory, deployment responsibilities, and hardening checklist for robotics workloads |
🤖 Crafted with precision by ✨Copilot following brilliant human instruction, then carefully refined by our team of discerning human reviewers.