Troubleshooting Guide
This guide covers common issues encountered during development, testing, and deployment of the AI on Edge Flagship Accelerator. Use this as a reference for quick resolution of frequent problems.
Development Environment Issues
Dev Container Problems
Container Won't Start
Symptoms: Dev Container fails to build or start
Solutions:
-
Check Docker Desktop:
# Verify Docker is runningdocker info# Check available space (containers need significant disk space)docker system df -
Clean Docker system:
# Remove unused containers and imagesdocker system prune -a# Remove Dev Container specificallydocker container rm $(docker container ls -aq --filter name=edge-ai) -
Rebuild container:
# Use VS Code Command Palette# Remote-Containers: Rebuild Container
Container Builds but Tools Missing
Symptoms: Container starts but required tools are not available
Solutions:
-
Verify tool availability:
# Check essential toolsterraform versionaz versionkubectl version --clientnpm --version -
Update container configuration:
# Check .devcontainer/devcontainer.json for tool versions# Rebuild with latest base image
Performance Issues
Symptoms: Slow performance inside Dev Container
Solutions:
-
Allocate more resources to Docker:
- Increase memory allocation in Docker Desktop settings
- Allocate more CPU cores if available
-
Use bind mounts efficiently:
- Avoid unnecessary file watchers
- Use dockerignore for large directories
Tool Configuration Issues
Azure CLI Authentication
Symptoms: Azure CLI commands fail with authentication errors
Solutions:
-
Interactive login:
# Login and set subscriptionaz loginaz account set --subscription "your-subscription-id"# Verify authenticationaz account show -
Service Principal authentication (for CI/CD):
az login --service-principal \-u $AZURE_CLIENT_ID \-p $AZURE_CLIENT_SECRET \--tenant $AZURE_TENANT_ID
Kubectl Configuration
Symptoms: kubectl commands fail to connect to cluster
Solutions:
-
Get cluster credentials:
# For AKS clusteraz aks get-credentials --resource-group myResourceGroup --name myAKSCluster# Verify connectionkubectl cluster-info -
Check kubeconfig:
# View current configurationkubectl config view# List available contextskubectl config get-contexts# Switch contextkubectl config use-context mycontext
Infrastructure Deployment Issues
Terraform Issues
State Lock Issues
Symptoms: Terraform operations fail with state lock errors
Solutions:
-
Wait for lock to release (if another operation is running)
-
Force unlock (use carefully):
# Get lock ID from error messageterraform force-unlock <LOCK_ID> -
Use workspace isolation:
# Create isolated workspace for testingterraform workspace new test-$(date +%s)terraform workspace select test-$(date +%s)
Provider Version Conflicts
Symptoms: Terraform init fails with provider version errors
Solutions:
-
Update provider constraints:
# In versions.tfterraform {required_providers {azurerm = {source = "hashicorp/azurerm"version = "~> 3.0"}}} -
Upgrade providers:
# Upgrade to latest compatible versionsterraform init -upgrade# Lock specific versionsterraform providers lock
Resource Naming Conflicts
Symptoms: Deployment fails due to resource name collisions
Solutions:
-
Use unique naming:
# Add random suffixresource "random_string" "suffix" {length = 8special = falseupper = false}locals {unique_name = "${var.prefix}-${random_string.suffix.result}"} -
Check existing resources:
# List resources in resource groupaz resource list --resource-group myResourceGroup
Bicep Issues
Template Compilation Errors
Symptoms: Bicep build fails with syntax errors
Solutions:
-
Check syntax with detailed output:
# Build with verbose outputaz bicep build --file main.bicep --verbose# Lint for issuesaz bicep lint --file main.bicep -
Validate parameter types:
// Ensure parameter decorators are correct@description('Resource location')@allowed(['eastus', 'westus'])param location string
Deployment Validation Failures
Symptoms: Template deploys but resources are not configured correctly
Solutions:
-
Use what-if deployment:
# Preview changes before deploymentaz deployment group what-if \--resource-group myResourceGroup \--template-file main.bicep \--parameters @parameters.json -
Validate incrementally:
# Deploy smaller components first# Add resources incrementally
Azure Resource Issues
Permission Denied Errors
Symptoms: Deployment fails with insufficient permissions
Solutions:
-
Check role assignments:
# List role assignments for subscriptionaz role assignment list --assignee $(az account show --query user.name -o tsv)# Check specific resource groupaz role assignment list --resource-group myResourceGroup -
Required permissions for components:
- Key Vault: Key Vault Administrator or Contributor
- Networking: Network Contributor
- Kubernetes: Azure Kubernetes Service Contributor
- Storage: Storage Account Contributor
Resource Provider Registration
Symptoms: Deployment fails with provider not registered errors
Solutions:
# Register required providers
az provider register --namespace Microsoft.KeyVault
az provider register --namespace Microsoft.Network
az provider register --namespace Microsoft.ContainerService
# Check registration status
az provider show --namespace Microsoft.KeyVault --query registrationState
Git and Version Control Issues
SSH Authentication Issues
Symptoms: Git operations fail with SSH authentication errors
Solutions:
-
Generate SSH key (if not exists):
ssh-keygen -t ed25519 -C "your.email@example.com" -
Add key to SSH agent:
eval "$(ssh-agent -s)"ssh-add ~/.ssh/id_ed25519 -
Add public key to GitHub:
- Copy
~/.ssh/id_ed25519.pubto GitHub SSH keys
- Copy
-
Test SSH connection:
ssh -T git@github.com
Branch and Merge Issues
Merge Conflicts
Symptoms: Git merge fails with conflicts
Solutions:
-
Resolve conflicts manually:
# Start mergegit merge main# Edit conflicted files# Look for <<<<<<< HEAD markers# Stage resolved filesgit add resolved-file.tf# Complete mergegit commit -
Use merge tools:
# Configure merge toolgit config --global merge.tool vimdiff# Launch merge toolgit mergetool
Detached HEAD State
Symptoms: Git shows detached HEAD warnings
Solutions:
# Create branch from current state
git checkout -b new-branch-name
# Or discard changes and return to main
git checkout main
Conventional Commit Issues
Symptoms: Commits don't follow conventional format
Solutions:
-
Amend last commit:
# Fix most recent commit messagegit commit --amend -m "feat(terraform): add monitoring component" -
Interactive rebase for multiple commits:
# Rewrite last 3 commitsgit rebase -i HEAD~3 -
Use conventional commit format:
feat(scope): descriptionfix(scope): descriptiondocs(scope): descriptionchore(scope): description
Linting and Code Quality Issues
Lint Job Issues
Linter Failures
Symptoms: CI lint jobs report errors
Solutions:
-
Run specific linters locally:
# Run Terraform lintingnpm run tflint-fix-all# Run markdown lintingnpm run mdlint-fix -
Review pipeline logs:
Check the individual lint job output in the Azure Pipelines run to identify which linter and file failed.
Markdown Linting Issues
Common Markdown Errors
Symptoms: Markdown linting fails with formatting errors
Solutions:
-
MD025 (multiple H1 headings):
<!-- Remove duplicate H1 headings --><!-- Use only one # heading per file --> -
MD032 (lists need blank lines):
Text before list- List item 1- List item 2Text after list -
MD040 (code blocks need language):
```bashecho "Specify language for code blocks"```bash# Add empty line before and after
Spell Checking Issues
Spell Checking False Positives
Symptoms: cspell reports errors for technical terms
Solutions:
-
Add to project dictionary:
# Add technical terms to .cspell-dictionary.txtecho "terraform" >> .cspell-dictionary.txtecho "kubernetes" >> .cspell-dictionary.txt -
Inline ignores:
<!-- cspell:ignore terratest bicep -->This document discusses terratest and bicep.
Security Scanning Issues
Checkov Issues
Checkov False Positives
Symptoms: Checkov reports security issues for acceptable configurations
Solutions:
-
Skip specific checks:
# Terraform exampleresource "azurerm_storage_account" "example" {# checkov:skip=CKV_AZURE_33: Public access required for this use casepublic_network_access_enabled = true} -
Configure skip rules:
# .checkov.ymlskip-check:- CKV_AZURE_33- CKV2_AZURE_1
Checkov Performance Issues
Symptoms: Checkov scans take too long
Solutions:
# Scan only changed folders
npm run checkov-changes
# Scan specific directories
checkov -d src/000-cloud/010-security-identity
# Use parallel processing
checkov --external-checks-dir ./custom-checks --parallel
Testing and Validation Issues
Terratest Issues
Test Timeouts
Symptoms: Go tests fail with timeout errors
Solutions:
# Increase timeout
go test -v -timeout 60m ./tests/...
# Run specific test
go test -v -run TestSpecificFunction -timeout 30m
Azure Authentication in Tests
Symptoms: Tests fail with Azure authentication errors
Solutions:
// Use environment variables for authentication
// Set in CI/CD pipeline or locally
// AZURE_CLIENT_ID, AZURE_CLIENT_SECRET, AZURE_TENANT_ID
Resource Cleanup Issues
Symptoms: Test resources are not cleaned up
Solutions:
-
Manual cleanup:
# List test resource groupsaz group list --query "[?contains(name, 'test')].name" -o tsv# Delete test resourcesaz group delete --name test-resource-group --yes --no-wait -
Automated cleanup script:
#!/bin/bash# cleanup-test-resources.sh# Delete resource groups older than 24 hours with "test" in nameaz group list --query "[?contains(name, 'test')]" -o json | \jq -r '.[] | select(.properties.provisioningState == "Succeeded") | .name' | \xargs -I {} az group delete --name {} --yes --no-wait
Getting Help
Self-Help Resources
-
Documentation:
- Check component README files
- Review existing GitHub issues
- Consult Azure documentation
-
Debugging:
- Enable verbose logging
- Use step-by-step troubleshooting
- Isolate the problem
-
Testing:
- Use minimal reproduction cases
- Test in isolated environments
- Verify assumptions
GitHub Copilot Assistance
Use GitHub Copilot for troubleshooting:
# In VS Code chat, describe your issue:
"I'm getting a Terraform state lock error when deploying. How can I resolve this?"
"Checkov is reporting CKV_AZURE_33 for my storage account. Is this a false positive?"
"My Dev Container won't start and Docker Desktop shows an error. What should I check?"
Community Support
-
GitHub Issues:
- Search existing issues first
- Create new issue with detailed information
- Include error messages and environment details
-
Discussion Forums:
- Use GitHub Discussions for general questions
- Share solutions that worked for you
- Help others with similar issues
Creating Effective Bug Reports
Include this information when reporting issues:
**Environment:**
- OS: [Windows 11/macOS 13/Ubuntu 22.04]
- Docker Desktop version:
- VS Code version:
- Dev Container: [Yes/No]
**Tools:**
- Terraform version:
- Azure CLI version:
- kubectl version:
**Problem Description:**
Clear description of the issue
**Steps to Reproduce:**
1. Step one
2. Step two
3. See error
**Expected Behavior:**
What should happen
**Actual Behavior:**
What actually happens
**Error Messages:**
[Include full error messages]
**Additional Context:**
Any other relevant information
Escalation Process
For critical issues:
- Immediate help: Use GitHub Copilot for quick guidance
- Team support: Reach out to project maintainers
- Security issues: Use private reporting for vulnerabilities
- Blocking issues: Create high-priority GitHub issues
Remember: Most issues have been encountered before. Search existing documentation and issues first, then ask for help with specific details about your situation.
For more information about development workflows, see the Development Environment and Contributing Guidelines.
🤖 Crafted with precision by ✨Copilot following brilliant human instruction, then carefully refined by our team of discerning human reviewers.