Skip to main content

Architecture Decision Record: Policy as Code with OPA Gatekeeper

Status

[Use the appropriate status to represent this decision record]

  • Draft
  • Proposed
  • Accepted
  • Deprecated

Context

Which Kubernetes Policy-as-Code tooling will be used to implement policy management to control inbound traffic on the cluster? Our main security constraint in PSNet is to prevent the Azure ARC Agents Pods from committing to the cluster's state. Then, we need to implement strict pod isolation controls in our Kubernetes clusters to prevent unauthorized modifications from specific pods.

Decision

We will implement pod isolation controls using OPA Gatekeeper instead of developing custom admission webhook controllers in Golang. This decision is particularly compelling given the significant effort required to develop and maintain a Golang-based solution. Gatekeeper's declarative approach aligns with our existing artifacts and provides a more sustainable path forward.

Decision drivers

  • Kubernetes policy requires:

    • Implementation of admission and mutation controls at cluster level to enforce security policies
  • The solution must provide:

    • Capability to be maintainable and auditable
    • Scalability across multiple clusters
    • Reducing operational overhead

Considered options

  1. Custom Admission Webhooks in Golang

    • Requires developing dedicated admission webhook controllers in Golang
    • Requires upskilling in Golang development
    • Requires setting up new development processes and dedicated CI/CD pipelines
    • Requires implementation of logging, monitoring, and maintenance procedures
    • Dependency on Golang expertise for long-term maintenance
  2. OPA Gatekeeper with Declarative Policies

    • Implementation through declarative YAML-based policy definitions, aligning with our existing infrastructure-as-code artifacts
    • Policy enforcement using Rego language: programming language built for policy expression
    • Utilization of existing constraint templates from OPA community
    • Built-in audit logging and monitoring capabilities
    • Native integration with Kubernetes admission controller framework
    • Automated certificate management and webhook lifecycle handling
    • Regular updates and security patches managed by the OPA community

Decision Conclusion

Positive

The adoption of OPA Gatekeeper will provide several advantages:

  • Aligns with our existing infrastructure-as-code practices
  • Eliminates need for extensive Golang training and hiring
  • Provides declarative policy management with version-controlled YAML manifests
  • Reduces development and maintenance complexity
  • Offers built-in audit logging and violation reporting
  • Enables access to community-maintained policy libraries
  • Ensures consistent policy enforcement across clusters

Negative

We must address several challenges:

  • Teams will need to learn the Rego policy language
  • Complex rules may require new ways of thinking about policy enforcement
  • Adding another component to our Kubernetes infrastructure increases complexity

Mitigation

To address these challenges, we will:

  • Do upskilling on Rego and Gatekeeper
  • Start with simple policies and gradually increase complexity
  • Leverage existing constraint templates where possible
  • Create documentation for each policy implemented to facilitate maintenance

References

AI and automation capabilities described in this scenario should be implemented following responsible AI principles, including fairness, reliability, safety, privacy, inclusiveness, transparency, and accountability. Organizations should ensure appropriate governance, monitoring, and human oversight are in place for all AI-powered solutions.