AKS cluster configuration for robotics workloads with AzureML and NVIDIA OSMO.
[!NOTE] This page is part of the deployment guide. Return there for the full deployment sequence.
cd infrastructure/terraform && terraform apply)az login)osmo) for backend deployment[!NOTE] Scripts automatically install required Azure CLI extensions (
k8s-extension,ml) if missing.
[!IMPORTANT] The default infrastructure deploys a private AKS cluster. You must deploy the VPN Gateway and connect before running these scripts. See VPN Gateway for setup instructions. Without VPN,
kubectlcommands fail withno such hosterrors.To skip VPN, set
should_enable_private_aks_cluster = falsein your Terraform configuration. See Network Configuration Modes.
| Role | Scope | Purpose |
|---|---|---|
| Azure Kubernetes Service Cluster User Role | AKS Cluster | Get cluster credentials |
| Contributor | Resource Group | Extension and FIC creation |
| Key Vault Secrets User | Key Vault | Read PostgreSQL/Redis credentials |
| Storage Blob Data Contributor | Storage Account | Create workflow containers |
# Connect to cluster (values from terraform output)
az aks get-credentials --resource-group <rg> --name <aks>
# Verify connectivity (requires VPN for private clusters)
kubectl cluster-info
# Expected: Kubernetes control plane is running at https://...
# If you see "no such host" errors, connect to VPN first
# Deploy GPU infrastructure (required for all paths)
./01-deploy-robotics-charts.sh
# Choose your path:
# - AzureML: ./02-deploy-azureml-extension.sh
# - OSMO: ./03-deploy-osmo-control-plane.sh && ./04-deploy-osmo-backend.sh
Three authentication and registry configurations are supported. Choose based on your security requirements.
Simplest setup using storage account keys and public NVIDIA registry.
# terraform.tfvars
osmo_config = {
should_enable_identity = false
should_federate_identity = false
control_plane_namespace = "osmo-control-plane"
operator_namespace = "osmo-operator"
workflows_namespace = "osmo-workflows"
}
./01-deploy-robotics-charts.sh
./02-deploy-azureml-extension.sh
./03-deploy-osmo-control-plane.sh
./04-deploy-osmo-backend.sh --use-access-keys
Secure, key-less authentication via Azure Workload Identity.
# terraform.tfvars
osmo_config = {
should_enable_identity = true
should_federate_identity = true
control_plane_namespace = "osmo-control-plane"
operator_namespace = "osmo-operator"
workflows_namespace = "osmo-workflows"
}
./01-deploy-robotics-charts.sh
./02-deploy-azureml-extension.sh
./03-deploy-osmo-control-plane.sh
./04-deploy-osmo-backend.sh
Scripts auto-detect the OSMO managed identity from Terraform outputs and configure ServiceAccount annotations.
Enterprise deployment using private Azure Container Registry.
Pre-requisite: Import images to ACR before deployment.
# Get ACR name and import images
cd ../001-iac
ACR_NAME=$(terraform output -json container_registry | jq -r '.value.name')
az acr login --name "$ACR_NAME"
# Set versions
OSMO_VERSION="${OSMO_VERSION:-6.0.0}"
CHART_VERSION="${CHART_VERSION:-1.0.0}"
OSMO_IMAGES=(
service router web-ui worker logger agent
backend-listener backend-worker client
delayed-job-monitor init-container
)
for img in "${OSMO_IMAGES[@]}"; do
az acr import --name "$ACR_NAME" \
--source "nvcr.io/nvidia/osmo/${img}:${OSMO_VERSION}" \
--image "osmo/${img}:${OSMO_VERSION}"
done
# Import Helm charts
for chart in osmo router ui backend-operator; do
helm pull "oci://nvcr.io/nvidia/osmo/${chart}" --version "$CHART_VERSION"
helm push "${chart}-${CHART_VERSION}.tgz" "oci://${ACR_NAME}.azurecr.io/helm"
rm "${chart}-${CHART_VERSION}.tgz"
done
cd ../002-setup
./01-deploy-robotics-charts.sh
./02-deploy-azureml-extension.sh
./03-deploy-osmo-control-plane.sh --use-acr
./04-deploy-osmo-backend.sh --use-acr
| Access Keys | Workload Identity | Workload Identity + ACR | |
|---|---|---|---|
| Storage Auth | Access Keys | Workload Identity | Workload Identity |
| Registry | nvcr.io | nvcr.io | Private ACR |
| Air-Gap | ✗ | ✗ | ✓ |
When deploying with should_enable_private_endpoint = false, cluster endpoints are publicly accessible. Secure the following components:
The AzureML inference router (azureml-fe) handles incoming requests. For public deployments:
allowInsecureConnections=False)sslSecret or provide certificate filesinternalLoadBalancerProvider=azure for internal-only accessSee Secure Kubernetes online endpoints and Inference routing configuration.
The OSMO web interface requires authentication for public access:
See OSMO Keycloak configuration.
| Script | Purpose |
|---|---|
01-deploy-robotics-charts.sh |
GPU Operator, KAI Scheduler |
02-deploy-azureml-extension.sh |
AzureML K8s extension, compute attach |
03-deploy-osmo-control-plane.sh |
OSMO service, router, web-ui |
04-deploy-osmo-backend.sh |
Backend operator, workflow storage |
| Flag | Scripts | Description |
|---|---|---|
--use-access-keys |
04-deploy-osmo-backend.sh |
Storage account keys instead of workload identity |
--use-acr |
03-deploy-osmo-control-plane.sh, 04-deploy-osmo-backend.sh |
Pull from Terraform-deployed ACR |
--acr-name NAME |
03-deploy-osmo-control-plane.sh, 04-deploy-osmo-backend.sh |
Specify alternate ACR |
--config-preview |
All | Print config and exit |
Scripts read from Terraform outputs in infrastructure/terraform/. Override with environment variables:
| Variable | Description |
|---|---|
AZURE_SUBSCRIPTION_ID |
Azure subscription |
AZURE_RESOURCE_GROUP |
Resource group |
AKS_CLUSTER_NAME |
Cluster name |
# Check pods
kubectl get pods -n gpu-operator
kubectl get pods -n azureml
kubectl get pods -n osmo-control-plane
kubectl get pods -n osmo-operator
# Workload identity (if enabled)
kubectl get sa -n osmo-control-plane osmo-control-plane -o yaml | grep azure.workload.identity
🤖 Crafted with precision by ✨Copilot following brilliant human instruction, then carefully refined by our team of discerning human reviewers.