SRE / Operations Guide
This guide is for you if you manage infrastructure, handle incidents, deploy systems, maintain CI/CD pipelines, or ensure production reliability. SRE and operations engineers have 13+ addressable assets spanning infrastructure as code, incident response, security operations, and deployment automation.
CAUTION
The security agents and prompts referenced in this guide are assistive tools only. They do not replace professional security tooling (SAST, DAST, SCA, penetration testing, compliance scanners) or qualified human review. All AI-generated security plans, threat models, risk registers, and incident response runbooks must be reviewed and validated by qualified security professionals before use. AI outputs may contain inaccuracies, miss critical threats, or produce recommendations that are incomplete or inappropriate for your environment. Never treat AI-generated security artifacts as authoritative without independent verification.
Recommended Collections
TIP
Install the HVE Core extension from the VS Code Marketplace to get all stable artifacts with zero configuration.
Your primary collections are coding-standards (IaC-specific instructions for Terraform, Bicep, Bash, and GitHub Actions), security-planning (incident response tooling), and hve-core (structured investigation and remediation workflows). For clone-based setups, use the hve-core-installer agent with install coding-standards security-planning hve-core.
What HVE Core Does for You
- Activates infrastructure-as-code standards for Terraform, Bicep, Bash scripts, and GitHub Actions workflows automatically
- Generates incident response runbooks and playbooks for operational scenarios
- Supports structured investigation of production issues through research workflows
- Validates dependency pinning and SHA integrity for supply chain security
- Reviews infrastructure changes against operational best practices
- Manages Git workflows for infrastructure repositories including merge and rebase operations
Your Lifecycle Stages
NOTE
SRE / Operations engineers primarily operate in these lifecycle stages:
Stage 1: Setup: Configure environments, install tooling, set up infrastructure Stage 3: Product Definition: Define infrastructure requirements and operational specifications Stage 6: Implementation: Build infrastructure, write IaC, configure pipelines Stage 8: Delivery: Deploy infrastructure, validate environments, release changes Stage 9: Operations: Monitor systems, handle incidents, maintain production
Stage Walkthrough
- Stage 1: Setup. Configure your development environment and install HVE Core tooling using the Getting Started guide. Set up IaC project structure for your infrastructure repository.
- Stage 3: Product Definition. Define infrastructure requirements, SLOs, and operational contracts. Use the security-plan-creator agent for infrastructure security planning.
- Stage 6: Implementation. Write infrastructure code with auto-activated standards for Terraform (
*.tf), Bicep (bicep/**), Bash (*.sh), and GitHub Actions (*.yml). Use the task-implementor agent for complex multi-file changes. - Stage 8: Delivery. Deploy infrastructure changes through CI/CD pipelines. Use
/git-commitfor conventional commits and/pull-requestfor infrastructure PRs with proper review. - Stage 9: Operations. Handle incidents with
/incident-responserunbooks. Investigate production issues with the task-researcher agent for structured root cause analysis.
Starter Prompts
/incident-response Create an incident response runbook for a data breach
involving customer PII exposure through a misconfigured storage bucket.
Include containment steps, GDPR notification timelines, forensic evidence
preservation, and post-incident review process.
Select task-researcher agent:
Investigate elevated 503 errors on the /api/orders endpoint. Error rate
increased from 0.1% to 12% starting at 14:30 UTC. The service runs on
3 Kubernetes pods in the production-east cluster. Check pod logs, recent
deployments, and upstream dependency health.
Select security-plan-creator agent:
Create a security plan for the Kubernetes ingress controller cluster.
Cover TLS termination and certificate rotation automation, network policy
rules for namespace isolation, WAF configuration for OWASP Top 10
protection, and audit logging for ingress configuration changes.
/pull-request Create a PR for infrastructure changes
Select task-implementor agent:
Implement Terraform infrastructure for the Redis cache cluster in the
staging environment. Use existing module patterns in infra/modules/.
Configure a 3-node cluster with 6GB memory, automatic failover, and
encryption at rest. Output the connection string to the Vault KV store.
Key Agents and Workflows
| Agent | Purpose | Docs |
|---|---|---|
| task-researcher | Structured production issue investigation | Task Researcher |
| task-implementor | Infrastructure code implementation | Task Implementor |
| task-reviewer | Infrastructure code review | Task Reviewer |
| security-plan-creator | Infrastructure security planning | Agent file |
| pr-review | Pull request review for infrastructure changes | Agent file |
| memory | Session context and preference persistence | Agent file |
Prompts complement the agents for operational workflows:
| Prompt | Purpose | Invoke |
|---|---|---|
| incident-response | Incident response runbook creation | /incident-response |
| git-commit | Conventional commit message generation | /git-commit |
| pull-request | Pull request creation | /pull-request |
| git-merge | Git merge and rebase workflow management | /git-merge |
Auto-activated instructions apply IaC standards based on file type: Terraform (*.tf, *.tfvars), Bicep (bicep/**), Bash (*.sh), and GitHub Actions workflows (.github/workflows/*.yml).
Tips
| Do | Don't |
|---|---|
| Let IaC-specific instructions auto-activate by file type | Manually enforce Terraform or Bicep standards |
| Create incident response runbooks before incidents occur | Write runbooks reactively during active incidents |
| Use the task-researcher agent for structured root cause analysis | Debug production issues without systematic investigation |
| Review infrastructure PRs with the pr-review agent | Merge infrastructure changes without code review |
Use /git-commit for consistent, conventional commit history | Write ad-hoc commit messages for infrastructure changes |
Related Roles
- SRE + Security Architect: Operational security, incident response, and monitoring connect security planning with production operations. Threat models inform operational controls. See the Security Architect Guide.
- SRE + Engineer: Production reliability requires collaboration between infrastructure operations and feature development. Deployment pipelines serve both roles. See the Engineer Guide.
- SRE + Tech Lead: Infrastructure architecture decisions shape operational practices. IaC standards maintain consistency across environments. See the Tech Lead Guide.
Next Steps
TIP
Explore IaC coding standards: Coding Standards Collection Set up incident response tools: Security Planning Collection See how operations fits the project lifecycle: AI-Assisted Project Lifecycle
🤖 Crafted with precision by ✨Copilot following brilliant human instruction, then carefully refined by our team of discerning human reviewers.