physical-ai-toolchain

[!NOTE] This guide expands on the Infrastructure as Code Style section of the main contributing guide.

Infrastructure code follows strict conventions for consistency, security, and maintainability.

Terraform Conventions

Formatting

# Format all Terraform files before committing
terraform fmt -recursive infrastructure/terraform/

# Validate formatting and syntax across all deployment directories
npm run lint:tf:validate

Variable Naming

Module Structure

Each Terraform module must include:

modules/
  module-name/
    main.tf              # Resource definitions
    variables.tf         # Input variables with descriptions and types
    variables.core.tf    # Core variables (environment, resource_prefix, instance, resource_group)
    outputs.tf           # Output values
    versions.tf          # Provider version constraints
    tests/
      setup/
        main.tf          # Mock data generator with random prefix and typed outputs
      naming.tftest.hcl  # Resource naming convention assertions
      conditionals.tftest.hcl  # should_* boolean conditional tests
      outputs.tftest.hcl # Output structure and nullability tests

The root deployment directory (infrastructure/terraform/) also has integration tests:

infrastructure/terraform/
  tests/
    setup/
      main.tf                # Core variables (no resource_group — root creates its own)
    integration.tftest.hcl   # Resource group conditionals, module instantiation
    outputs.tftest.hcl       # Output presence and nullability

Terraform Testing

All modules use native terraform test with mock_provider for plan-time validation. Tests require no Azure credentials.

Running Tests

# Run all module tests via CI script
npm run test:tf

# Run tests for a specific module
cd infrastructure/terraform/modules/platform
terraform init -backend=false
terraform test

# Run a single test file
terraform test -filter=tests/naming.tftest.hcl

Setup Module Pattern

Each module’s tests/setup/main.tf generates mock input values with internally consistent IDs derived from a random prefix:

locals {
  subscription_id_part = "/subscriptions/00000000-0000-0000-0000-000000000000"
  resource_prefix      = "t${random_string.prefix.id}"
  environment          = "dev"
  instance             = "001"
  resource_group_name  = "rg-${local.resource_prefix}-${local.environment}-${local.instance}"
  resource_group_id    = "${local.subscription_id_part}/resourceGroups/${local.resource_group_name}"
}

output "resource_group" {
  value = {
    id       = local.resource_group_id
    name     = local.resource_group_name
    location = "westus3"
  }
}

Derive all Azure resource IDs from the random prefix using locals. Do not hardcode synthetic IDs.

Test File Conventions

Test files use mock_provider to intercept all provider calls and command = plan for assertions:

mock_provider "azurerm" {}
mock_provider "azapi" {}

// Override data sources that generate invalid mock values
override_data {
  target = data.azurerm_client_config.current
  values = {
    tenant_id = "00000000-0000-0000-0000-000000000000"
  }
}

run "setup" {
  module {
    source = "./tests/setup"
  }
}

run "verify_naming" {
  command = plan

  variables {
    resource_prefix = run.setup.resource_prefix
    environment     = run.setup.environment
    instance        = run.setup.instance
    resource_group  = run.setup.resource_group
  }

  assert {
    condition     = azurerm_key_vault.main.name == "kv${run.setup.resource_prefix}${run.setup.environment}${run.setup.instance}"
    error_message = "Key Vault name must follow kv{prefix}{env}{instance}"
  }
}

Mock Provider Constraints

Constraint Resolution
data.azurerm_client_config.current generates random strings that fail UUID validation Add override_data block with valid tenant_id
command = apply generates invalid Azure resource IDs for role assignments Use command = plan and assert only on input-derived attributes
Computed attributes (.id, .fqdn) are unknown at plan time Assert on resource count, name, and configuration values instead
file() built-in is not intercepted by mock providers Provide a real stub file (see automation module tests/setup/scripts/stub.ps1)

Test Categories

File Purpose
naming.tftest.hcl Resource names follow {abbreviation}-{prefix}-{env}-{instance}
conditionals.tftest.hcl should_* booleans control resource creation via count
defaults.tftest.hcl Default variable values produce expected configuration
security.tftest.hcl Security settings (RBAC, TLS, network ACLs)
outputs.tftest.hcl Output nullability when features are disabled
validation.tftest.hcl Variable validation rules via expect_failures

Resource Tagging

All Azure resources must include standard tags:

tags = merge(
  var.common_tags,
  {
    environment = var.environment
    workload    = "robotics-ml"
    managed_by  = "terraform"
    cost_center = var.cost_center
  }
)

Security Patterns

Example

resource "azurerm_kubernetes_cluster" "aks" {
  name                = "aks-${var.environment}-${var.location}"
  location            = var.location
  resource_group_name = var.resource_group_name

  default_node_pool {
    name       = "system"
    node_count = var.system_node_count
    vm_size    = "Standard_D4s_v5"
  }

  identity {
    type = "SystemAssigned"
  }

  private_cluster_enabled = var.network_mode == "private"

  tags = merge(
    var.common_tags,
    {
      component = "aks-cluster"
    }
  )
}

Shell Script Conventions

Shebang and Error Handling

Every shell script must begin with:

#!/usr/bin/env bash
set -euo pipefail
IFS=$'\n\t'

Script Documentation

Include header documentation:

#!/usr/bin/env bash
# Deploy OSMO backend operator to AKS cluster
#
# Prerequisites:
#   - AKS cluster with GPU node pool deployed
#   - OSMO control plane installed (03-deploy-osmo-control-plane.sh)
#   - kubectl configured with AKS credentials
#
# Environment Variables:
#   RESOURCE_GROUP_NAME: Azure resource group name (required)
#   AKS_CLUSTER_NAME: AKS cluster name (required)
#   OSMO_VERSION: OSMO version to deploy (default: 6.0.0)
#
# Usage:
#   export RESOURCE_GROUP_NAME="rg-robotics-prod"
#   export AKS_CLUSTER_NAME="aks-robotics-prod"
#   ./04-deploy-osmo-backend.sh

Validation

# Lint all shell scripts before committing
shellcheck deploy/**/*.sh scripts/**/*.sh

# Check specific script
shellcheck -x infrastructure/setup/01-deploy-robotics-charts.sh

Configuration Management

: "${RESOURCE_GROUP_NAME:?Environment variable RESOURCE_GROUP_NAME must be set}"
: "${AKS_CLUSTER_NAME:?Environment variable AKS_CLUSTER_NAME must be set}"
OSMO_VERSION="${OSMO_VERSION:-6.0.0}"
LOG_LEVEL="${LOG_LEVEL:-info}"

For complete shell script guidance, see shell-scripts.instructions.md.

All new source files must include the Microsoft copyright header.

Format

# Copyright (c) Microsoft Corporation.
# Licensed under the MIT License.

Language-Specific Examples

Python (.py):

# Copyright (c) Microsoft Corporation.
# Licensed under the MIT License.

"""Module docstring."""

import os

Terraform (.tf):

# Copyright (c) Microsoft Corporation.
# Licensed under the MIT License.

terraform {
  required_version = ">= 1.9.8"
}

Shell Script (.sh):

#!/usr/bin/env bash
# Copyright (c) Microsoft Corporation.
# Licensed under the MIT License.

set -euo pipefail

YAML (.yaml, .yml):

# Copyright (c) Microsoft Corporation.
# Licensed under the MIT License.

apiVersion: v1
kind: ConfigMap

Placement