Terraform configuration for the robotics reference architecture. Deploys Azure resources including AKS with GPU node pools, Azure ML workspace, storage, and OSMO backend services.
[!NOTE] This page is part of the deployment guide. Return there for the full deployment sequence.
| Tool | Version | Setup or Check |
|---|---|---|
| Azure CLI | Latest | az login |
| Terraform | 1.5+ | terraform version |
| GPU VM quota | Region-specific | e.g., Standard_NV36ads_A10_v5 |
| Role | Scope |
|---|---|
| Contributor | Subscription (new RG) or Resource Group (existing RG) |
| Role Based Access Control Administrator | Subscription (new RG) or Resource Group (existing RG) |
Terraform creates role assignments for managed identities, requiring Microsoft.Authorization/roleAssignments/write permission. The Contributor role explicitly blocks this action; the RBAC Administrator role provides it.
[!NOTE] Use subscription scope if creating a new resource group (
should_create_resource_group = true). Use resource group scope if the resource group already exists.
Alternative: Owner role (grants more permissions than required).
cd infrastructure/terraform
source prerequisites/az-sub-init.sh
cp terraform.tfvars.example terraform.tfvars
terraform init && terraform apply -var-file=terraform.tfvars
[!IMPORTANT] The default configuration creates a private AKS cluster (
should_enable_private_aks_cluster = true). After deploying infrastructure, you must deploy the VPN Gateway and connect before runningkubectlcommands or cluster setup scripts.
| Variable | Description | Required |
|---|---|---|
environment |
Deployment environment (dev, test, prod) | Yes |
resource_prefix |
Resource naming prefix | Yes |
location |
Azure region | Yes |
instance |
Instance identifier | No (default: “001”) |
tags |
Resource group tags | No (default: {}) |
| Variable | Description | Default |
|---|---|---|
system_node_pool_vm_size |
VM size for AKS system node pool | Standard_D8ds_v5 |
system_node_pool_node_count |
Number of nodes for AKS system node pool | 1 |
system_node_pool_zones |
Availability zones for system node pool | null |
should_enable_system_node_pool_auto_scaling |
Enable auto-scaling for system node pool | false |
system_node_pool_min_count |
Minimum nodes when auto-scaling enabled | null |
system_node_pool_max_count |
Maximum nodes when auto-scaling enabled | null |
| Variable | Description | Default |
|---|---|---|
should_enable_nat_gateway |
Deploy NAT Gateway for outbound connectivity | true |
should_enable_private_endpoint |
Deploy private endpoints and DNS zones for Azure services | true |
should_enable_private_aks_cluster |
Make AKS API endpoint private (requires VPN for kubectl) | true |
should_enable_public_network_access |
Allow public access to resources | true |
should_deploy_postgresql |
Deploy PostgreSQL Flexible Server for OSMO | true |
should_deploy_redis |
Deploy Azure Managed Redis for OSMO | true |
should_deploy_grafana |
Deploy Azure Managed Grafana dashboard | true |
should_deploy_monitor_workspace |
Deploy Azure Monitor Workspace for Prometheus metrics | true |
should_deploy_ampls |
Deploy Azure Monitor Private Link Scope and endpoint | true |
should_deploy_dce |
Deploy Data Collection Endpoint for observability | true |
should_deploy_aml_compute |
Deploy AzureML managed GPU compute cluster | false |
should_include_aks_dns_zone |
Include AKS private DNS zone in core DNS zones | true |
Three deployment modes are supported based on security requirements:
All Azure services use private endpoints and AKS has a private control plane. Requires VPN for all access.
# terraform.tfvars (default values)
should_enable_private_endpoint = true
should_enable_private_aks_cluster = true
Deploy VPN Gateway after infrastructure: cd vpn && terraform apply
Azure services (Storage, Key Vault, ACR, PostgreSQL, Redis) use private endpoints, but AKS control plane is publicly accessible. No VPN required for kubectl access.
# terraform.tfvars
should_enable_private_endpoint = true
should_enable_private_aks_cluster = false
This mode provides security for Azure resources while allowing cluster management without VPN.
All endpoints are publicly accessible. Not recommended for production without additional hardening.
# terraform.tfvars
should_enable_private_endpoint = false
should_enable_private_aks_cluster = false
[!WARNING] Public endpoints expose services to the internet. When using this configuration, you must secure cluster workloads:
AzureML Extension: Configure HTTPS and restrict access via inference router settings. See Secure online endpoints and Inference routing.
OSMO UI: Enable Keycloak authentication to protect the web interface. See OSMO Keycloak configuration.
Enable managed identity for OSMO services (recommended for production):
osmo_config = {
should_enable_identity = true
should_federate_identity = true
control_plane_namespace = "osmo-control-plane"
operator_namespace = "osmo-operator"
workflows_namespace = "osmo-workflows"
}
See variables.tf for all configuration options.
🤖 Crafted with precision by ✨Copilot following brilliant human instruction, then carefully refined by our team of discerning human reviewers.