[!NOTE] This guide expands on the Infrastructure as Code Style section of the main contributing guide.
Infrastructure code follows strict conventions for consistency, security, and maintainability.
# Format all Terraform files before committing
terraform fmt -recursive infrastructure/terraform/
# Validate formatting and syntax across all deployment directories
npm run lint:tf:validate
gpu_node_pool_vm_size not vm_skushould_: should_enable_private_endpoints, should_deploy_vpnaks_cluster_name, aks_node_count, aks_versionEach Terraform module must include:
modules/
module-name/
main.tf # Resource definitions
variables.tf # Input variables with descriptions and types
variables.core.tf # Core variables (environment, resource_prefix, instance, resource_group)
outputs.tf # Output values
versions.tf # Provider version constraints
tests/
setup/
main.tf # Mock data generator with random prefix and typed outputs
naming.tftest.hcl # Resource naming convention assertions
conditionals.tftest.hcl # should_* boolean conditional tests
outputs.tftest.hcl # Output structure and nullability tests
The root deployment directory (infrastructure/terraform/) also has integration tests:
infrastructure/terraform/
tests/
setup/
main.tf # Core variables (no resource_group — root creates its own)
integration.tftest.hcl # Resource group conditionals, module instantiation
outputs.tftest.hcl # Output presence and nullability
All modules use native terraform test with mock_provider for plan-time validation. Tests require no Azure credentials.
# Run all module tests via CI script
npm run test:tf
# Run tests for a specific module
cd infrastructure/terraform/modules/platform
terraform init -backend=false
terraform test
# Run a single test file
terraform test -filter=tests/naming.tftest.hcl
Each module’s tests/setup/main.tf generates mock input values with internally consistent IDs derived from a random prefix:
locals {
subscription_id_part = "/subscriptions/00000000-0000-0000-0000-000000000000"
resource_prefix = "t${random_string.prefix.id}"
environment = "dev"
instance = "001"
resource_group_name = "rg-${local.resource_prefix}-${local.environment}-${local.instance}"
resource_group_id = "${local.subscription_id_part}/resourceGroups/${local.resource_group_name}"
}
output "resource_group" {
value = {
id = local.resource_group_id
name = local.resource_group_name
location = "westus3"
}
}
Derive all Azure resource IDs from the random prefix using locals. Do not hardcode synthetic IDs.
Test files use mock_provider to intercept all provider calls and command = plan for assertions:
mock_provider "azurerm" {}
mock_provider "azapi" {}
// Override data sources that generate invalid mock values
override_data {
target = data.azurerm_client_config.current
values = {
tenant_id = "00000000-0000-0000-0000-000000000000"
}
}
run "setup" {
module {
source = "./tests/setup"
}
}
run "verify_naming" {
command = plan
variables {
resource_prefix = run.setup.resource_prefix
environment = run.setup.environment
instance = run.setup.instance
resource_group = run.setup.resource_group
}
assert {
condition = azurerm_key_vault.main.name == "kv${run.setup.resource_prefix}${run.setup.environment}${run.setup.instance}"
error_message = "Key Vault name must follow kv{prefix}{env}{instance}"
}
}
| Constraint | Resolution |
|---|---|
data.azurerm_client_config.current generates random strings that fail UUID validation |
Add override_data block with valid tenant_id |
command = apply generates invalid Azure resource IDs for role assignments |
Use command = plan and assert only on input-derived attributes |
Computed attributes (.id, .fqdn) are unknown at plan time |
Assert on resource count, name, and configuration values instead |
file() built-in is not intercepted by mock providers |
Provide a real stub file (see automation module tests/setup/scripts/stub.ps1) |
| File | Purpose |
|---|---|
naming.tftest.hcl |
Resource names follow {abbreviation}-{prefix}-{env}-{instance} |
conditionals.tftest.hcl |
should_* booleans control resource creation via count |
defaults.tftest.hcl |
Default variable values produce expected configuration |
security.tftest.hcl |
Security settings (RBAC, TLS, network ACLs) |
outputs.tftest.hcl |
Output nullability when features are disabled |
validation.tftest.hcl |
Variable validation rules via expect_failures |
All Azure resources must include standard tags:
tags = merge(
var.common_tags,
{
environment = var.environment
workload = "robotics-ml"
managed_by = "terraform"
cost_center = var.cost_center
}
)
.tfvars filesOwner unless required)resource "azurerm_kubernetes_cluster" "aks" {
name = "aks-${var.environment}-${var.location}"
location = var.location
resource_group_name = var.resource_group_name
default_node_pool {
name = "system"
node_count = var.system_node_count
vm_size = "Standard_D4s_v5"
}
identity {
type = "SystemAssigned"
}
private_cluster_enabled = var.network_mode == "private"
tags = merge(
var.common_tags,
{
component = "aks-cluster"
}
)
}
Every shell script must begin with:
#!/usr/bin/env bash
set -euo pipefail
IFS=$'\n\t'
Include header documentation:
#!/usr/bin/env bash
# Deploy OSMO backend operator to AKS cluster
#
# Prerequisites:
# - AKS cluster with GPU node pool deployed
# - OSMO control plane installed (03-deploy-osmo-control-plane.sh)
# - kubectl configured with AKS credentials
#
# Environment Variables:
# RESOURCE_GROUP_NAME: Azure resource group name (required)
# AKS_CLUSTER_NAME: AKS cluster name (required)
# OSMO_VERSION: OSMO version to deploy (default: 6.0.0)
#
# Usage:
# export RESOURCE_GROUP_NAME="rg-robotics-prod"
# export AKS_CLUSTER_NAME="aks-robotics-prod"
# ./04-deploy-osmo-backend.sh
# Lint all shell scripts before committing
shellcheck deploy/**/*.sh scripts/**/*.sh
# Check specific script
shellcheck -x infrastructure/setup/01-deploy-robotics-charts.sh
.conf, .env) for environment-specific values: "${RESOURCE_GROUP_NAME:?Environment variable RESOURCE_GROUP_NAME must be set}"
: "${AKS_CLUSTER_NAME:?Environment variable AKS_CLUSTER_NAME must be set}"
OSMO_VERSION="${OSMO_VERSION:-6.0.0}"
LOG_LEVEL="${LOG_LEVEL:-info}"
For complete shell script guidance, see shell-scripts.instructions.md.
All new source files must include the Microsoft copyright header.
# Copyright (c) Microsoft Corporation.
# Licensed under the MIT License.
Python (.py):
# Copyright (c) Microsoft Corporation.
# Licensed under the MIT License.
"""Module docstring."""
import os
Terraform (.tf):
# Copyright (c) Microsoft Corporation.
# Licensed under the MIT License.
terraform {
required_version = ">= 1.9.8"
}
Shell Script (.sh):
#!/usr/bin/env bash
# Copyright (c) Microsoft Corporation.
# Licensed under the MIT License.
set -euo pipefail
YAML (.yaml, .yml):
# Copyright (c) Microsoft Corporation.
# Licensed under the MIT License.
apiVersion: v1
kind: ConfigMap
terraform test documentation