Cleanup and Destroy
Remove deployed cluster components, destroy Azure infrastructure, and clean up resources. Component cleanup preserves Azure infrastructure by default; destroy operations remove Terraform-managed resources.
[!NOTE] This guide is part of the Deploy Hub. Return there for the full deployment lifecycle.
๐ Cleanup Orderโ
Run component cleanup before destroying infrastructure. Follow this order to avoid dependency issues.
| Step | Action | Detail |
|---|---|---|
| 1 | Uninstall OSMO Backend | Backend operator, workflow namespaces |
| 2 | Uninstall OSMO Control Plane | OSMO service, router, web-ui |
| 3 | Uninstall AzureML Extension | ML extension, compute target, FICs |
| 4 | Uninstall GPU Infrastructure | GPU Operator, KAI Scheduler |
| 5 | Destroy VPN (if deployed) | VPN Gateway, connections |
| 6 | Destroy Main Infrastructure | All Terraform-managed Azure resources |
๐งน Component Cleanupโ
Cleanup scripts remove Kubernetes resources from the AKS cluster without affecting Azure infrastructure.
| Script | Removes |
|---|---|
cleanup/uninstall-osmo-backend.sh | Backend operator, workflow namespaces |
cleanup/uninstall-osmo-control-plane.sh | OSMO service, router, web-ui |
cleanup/uninstall-azureml-extension.sh | ML extension, compute target, FICs |
cleanup/uninstall-robotics-charts.sh | GPU Operator, KAI Scheduler |
Run scripts from the infrastructure/setup/cleanup/ directory:
cd infrastructure/setup/cleanup
./uninstall-osmo-backend.sh
./uninstall-osmo-control-plane.sh
./uninstall-azureml-extension.sh
./uninstall-robotics-charts.sh
๐ Data Preservationโ
Uninstall scripts preserve data by default. Use flags for complete removal.
| Script | Flag | Description |
|---|---|---|
uninstall-osmo-backend.sh | --delete-container | Deletes blob container with workflow artifacts |
uninstall-osmo-control-plane.sh | --delete-mek | Removes encryption key ConfigMap |
uninstall-osmo-control-plane.sh | --purge-postgres | Drops OSMO tables from PostgreSQL |
uninstall-osmo-control-plane.sh | --purge-redis | Flushes OSMO keys from Redis |
uninstall-robotics-charts.sh | --delete-namespaces | Removes gpu-operator, kai-scheduler namespaces |
uninstall-robotics-charts.sh | --delete-crds | Removes GPU Operator CRDs |
Full cleanup including all data:
cd infrastructure/setup/cleanup
./uninstall-osmo-backend.sh --delete-container
./uninstall-osmo-control-plane.sh --purge-postgres --purge-redis --delete-mek
./uninstall-azureml-extension.sh --force
./uninstall-robotics-charts.sh --delete-namespaces --delete-crds
Selective cleanup for specific components:
# OSMO only (preserve AzureML and GPU infrastructure)
./uninstall-osmo-backend.sh
./uninstall-osmo-control-plane.sh
# AzureML only (preserve OSMO)
./uninstall-azureml-extension.sh
๐๏ธ Destroy Infrastructureโ
After removing cluster components, destroy Azure infrastructure using one of two approaches.
Terraform Destroyโ
Recommended approach. Preserves state files and allows clean redeployment.
cd infrastructure/terraform
# Destroy VPN first (if deployed)
cd vpn && terraform destroy -var-file=terraform.tfvars && cd ..
# Preview changes
terraform plan -destroy -var-file=terraform.tfvars
# Destroy main infrastructure
terraform destroy -var-file=terraform.tfvars
Delete Resource Groupโ
Fastest cleanup method. Removes all resources regardless of how they were created.
# Get resource group name from Terraform outputs
terraform output -raw resource_group | jq -r '.name'
# Delete resource group
az group delete --name <resource-group-name> --yes --no-wait
[!WARNING] Resource group deletion removes everything in the group, including resources not managed by Terraform. Terraform state becomes orphaned after this operation.
๐ Troubleshootingโ
Destroy Takes a Long Timeโ
Terraform removes resources in dependency order. Private Endpoints, AKS clusters, and PostgreSQL servers take 5-10 minutes each. Full destruction typically takes 20-30 minutes.
Monitor remaining resources during destruction:
az resource list --resource-group <resource-group> \
--query "[].{name:name, type:type}" -o table
Soft-Deleted Resources Block Redeploymentโ
Azure retains certain deleted resources in a soft-deleted state. Redeployment fails when Terraform creates a resource with the same name as a soft-deleted one.
| Resource | Soft Delete | Retention Period | Blocks Redeployment |
|---|---|---|---|
| Key Vault | Mandatory | 7-90 days (configurable) | Yes |
| Azure ML Workspace | Mandatory | 14 days (fixed) | Yes |
| Container Registry | Opt-in (preview) | 1-90 days (configurable) | No (disabled by default) |
| Storage Account | Recovery only | 14 days | No (same-name creation allowed) |
Purge soft-deleted Key Vault:
az keyvault list-deleted --subscription <subscription-id> \
--resource-type vault -o table
az keyvault purge --subscription <subscription-id> \
--name <key-vault-name>
[!NOTE] Key Vaults with
purge_protection_enabled = truecannot be purged and must wait for retention expiry. This configuration defaults toshould_enable_purge_protection = false.
Purge soft-deleted Azure ML Workspace:
az ml workspace delete \
--name <workspace-name> \
--resource-group <resource-group> \
--permanently-delete
Terraform State Mismatchโ
Resources manually deleted or created outside Terraform cause state mismatches.
Refresh state for resources deleted outside Terraform:
cd infrastructure/terraform
terraform refresh -var-file=terraform.tfvars
terraform plan -var-file=terraform.tfvars
Import resources created outside Terraform into state:
terraform plan -var-file=terraform.tfvars
terraform import -var-file=terraform.tfvars \
'<resource_address>' '<azure_resource_id>'
After import, run terraform plan to verify the imported resource matches configuration.
Resource Locks Prevent Deletionโ
Management locks block deletion operations:
az lock list --resource-group <resource-group> -o table
az lock delete --name <lock-name> --resource-group <resource-group>
๐ Relatedโ
๐ค Crafted with precision by โจCopilot following brilliant human instruction, then carefully refined by our team of discerning human reviewers.