Skip to main content

Cleanup and Destroy

Remove deployed cluster components, destroy Azure infrastructure, and clean up resources. Component cleanup preserves Azure infrastructure by default; destroy operations remove Terraform-managed resources.

[!NOTE] This guide is part of the Deploy Hub. Return there for the full deployment lifecycle.

๐Ÿ“‹ Cleanup Orderโ€‹

Run component cleanup before destroying infrastructure. Follow this order to avoid dependency issues.

StepActionDetail
1Uninstall OSMO BackendBackend operator, workflow namespaces
2Uninstall OSMO Control PlaneOSMO service, router, web-ui
3Uninstall AzureML ExtensionML extension, compute target, FICs
4Uninstall GPU InfrastructureGPU Operator, KAI Scheduler
5Destroy VPN (if deployed)VPN Gateway, connections
6Destroy Main InfrastructureAll Terraform-managed Azure resources

๐Ÿงน Component Cleanupโ€‹

Cleanup scripts remove Kubernetes resources from the AKS cluster without affecting Azure infrastructure.

ScriptRemoves
cleanup/uninstall-osmo-backend.shBackend operator, workflow namespaces
cleanup/uninstall-osmo-control-plane.shOSMO service, router, web-ui
cleanup/uninstall-azureml-extension.shML extension, compute target, FICs
cleanup/uninstall-robotics-charts.shGPU Operator, KAI Scheduler

Run scripts from the infrastructure/setup/cleanup/ directory:

cd infrastructure/setup/cleanup

./uninstall-osmo-backend.sh
./uninstall-osmo-control-plane.sh
./uninstall-azureml-extension.sh
./uninstall-robotics-charts.sh

๐Ÿ“Š Data Preservationโ€‹

Uninstall scripts preserve data by default. Use flags for complete removal.

ScriptFlagDescription
uninstall-osmo-backend.sh--delete-containerDeletes blob container with workflow artifacts
uninstall-osmo-control-plane.sh--delete-mekRemoves encryption key ConfigMap
uninstall-osmo-control-plane.sh--purge-postgresDrops OSMO tables from PostgreSQL
uninstall-osmo-control-plane.sh--purge-redisFlushes OSMO keys from Redis
uninstall-robotics-charts.sh--delete-namespacesRemoves gpu-operator, kai-scheduler namespaces
uninstall-robotics-charts.sh--delete-crdsRemoves GPU Operator CRDs

Full cleanup including all data:

cd infrastructure/setup/cleanup

./uninstall-osmo-backend.sh --delete-container
./uninstall-osmo-control-plane.sh --purge-postgres --purge-redis --delete-mek
./uninstall-azureml-extension.sh --force
./uninstall-robotics-charts.sh --delete-namespaces --delete-crds

Selective cleanup for specific components:

# OSMO only (preserve AzureML and GPU infrastructure)
./uninstall-osmo-backend.sh
./uninstall-osmo-control-plane.sh

# AzureML only (preserve OSMO)
./uninstall-azureml-extension.sh

๐Ÿ—‘๏ธ Destroy Infrastructureโ€‹

After removing cluster components, destroy Azure infrastructure using one of two approaches.

Terraform Destroyโ€‹

Recommended approach. Preserves state files and allows clean redeployment.

cd infrastructure/terraform

# Destroy VPN first (if deployed)
cd vpn && terraform destroy -var-file=terraform.tfvars && cd ..

# Preview changes
terraform plan -destroy -var-file=terraform.tfvars

# Destroy main infrastructure
terraform destroy -var-file=terraform.tfvars

Delete Resource Groupโ€‹

Fastest cleanup method. Removes all resources regardless of how they were created.

# Get resource group name from Terraform outputs
terraform output -raw resource_group | jq -r '.name'

# Delete resource group
az group delete --name <resource-group-name> --yes --no-wait

[!WARNING] Resource group deletion removes everything in the group, including resources not managed by Terraform. Terraform state becomes orphaned after this operation.

๐Ÿ” Troubleshootingโ€‹

Destroy Takes a Long Timeโ€‹

Terraform removes resources in dependency order. Private Endpoints, AKS clusters, and PostgreSQL servers take 5-10 minutes each. Full destruction typically takes 20-30 minutes.

Monitor remaining resources during destruction:

az resource list --resource-group <resource-group> \
--query "[].{name:name, type:type}" -o table

Soft-Deleted Resources Block Redeploymentโ€‹

Azure retains certain deleted resources in a soft-deleted state. Redeployment fails when Terraform creates a resource with the same name as a soft-deleted one.

ResourceSoft DeleteRetention PeriodBlocks Redeployment
Key VaultMandatory7-90 days (configurable)Yes
Azure ML WorkspaceMandatory14 days (fixed)Yes
Container RegistryOpt-in (preview)1-90 days (configurable)No (disabled by default)
Storage AccountRecovery only14 daysNo (same-name creation allowed)

Purge soft-deleted Key Vault:

az keyvault list-deleted --subscription <subscription-id> \
--resource-type vault -o table

az keyvault purge --subscription <subscription-id> \
--name <key-vault-name>

[!NOTE] Key Vaults with purge_protection_enabled = true cannot be purged and must wait for retention expiry. This configuration defaults to should_enable_purge_protection = false.

Purge soft-deleted Azure ML Workspace:

az ml workspace delete \
--name <workspace-name> \
--resource-group <resource-group> \
--permanently-delete

Terraform State Mismatchโ€‹

Resources manually deleted or created outside Terraform cause state mismatches.

Refresh state for resources deleted outside Terraform:

cd infrastructure/terraform
terraform refresh -var-file=terraform.tfvars
terraform plan -var-file=terraform.tfvars

Import resources created outside Terraform into state:

terraform plan -var-file=terraform.tfvars

terraform import -var-file=terraform.tfvars \
'<resource_address>' '<azure_resource_id>'

After import, run terraform plan to verify the imported resource matches configuration.

Resource Locks Prevent Deletionโ€‹

Management locks block deletion operations:

az lock list --resource-group <resource-group> -o table

az lock delete --name <lock-name> --resource-group <resource-group>

๐Ÿค– Crafted with precision by โœจCopilot following brilliant human instruction, then carefully refined by our team of discerning human reviewers.