physical-ai-toolchain

End-to-end deployment of Azure infrastructure and Kubernetes services for the robotics reference architecture. This guide covers prerequisites, Terraform provisioning, VPN access, cluster configuration, and teardown.

📖 Deployment Guides

Guide Description
Prerequisites Azure subscription initialization and resource provider registration
Infrastructure Deployment Terraform configuration for AKS, Azure ML, storage, and backend services
Infrastructure Reference Architecture, module structure, outputs, and troubleshooting
VPN Gateway Point-to-site and site-to-site VPN for private cluster access
Private DNS DNS zone setup for OSMO UI access
Cluster Automation Scheduled start/stop automation for cost management
Cluster Setup Kubernetes service deployment and OSMO configuration
Cluster Operations Accessing OSMO, troubleshooting, and optional scripts
Cleanup and Destroy Remove cluster components and destroy Azure infrastructure

📋 Deployment Order

  1. Prerequisites — Azure CLI login, subscription setup (2 min)
  2. Infrastructure — Terraform: AKS, ML workspace, storage (30-40 min)
  3. VPN Gateway — VPN Gateway for private cluster access (20-30 min)
  4. Cluster Setup — GPU Operator, OSMO, AzureML extension (30 min)

[!IMPORTANT] The default configuration deploys a private AKS cluster. The cluster API endpoint is not publicly accessible. You must deploy the VPN Gateway (step 3) and connect before running cluster setup scripts (step 4).

Skip step 3 if you set should_enable_private_aks_cluster = false in your Terraform configuration. See Infrastructure Reference for network configuration options.

📖 Quick Reference

Task Guide
Deploy Azure resources Infrastructure Deployment
Configure VPN access VPN Gateway
Set up cluster services Cluster Setup
Access OSMO UI Cluster Operations
Remove components Cleanup and Destroy
Destroy infrastructure Cleanup and Destroy

📖 Terminology

Term Definition
Deploy Provision Azure infrastructure or install cluster components using Terraform or deployment scripts.
Setup Post-deploy configuration and access steps for the cluster and workloads.
Cleanup Remove cluster components while keeping Azure infrastructure intact.
Destroy Delete Azure infrastructure (Terraform destroy or resource group deletion).

🤖 Crafted with precision by ✨Copilot following brilliant human instruction, then carefully refined by our team of discerning human reviewers.