Skip to content

NIST AI Risk Management Framework (AI RMF 1.0) โ€” Alignment Assessment

Disclaimer: This document is an internal self-assessment mapping, NOT a validated certification or third-party audit. It documents how the toolkit's capabilities align with the referenced standard. Organizations must perform their own compliance assessments with qualified auditors.

Agent Governance Toolkit (AGT) Document Version: 1.0 Date: 2026-07-14 Classification: Public Framework Reference: NIST AI 100-1 โ€” Artificial Intelligence Risk Management Framework


Table of Contents

  1. Executive Summary
  2. Methodology
  3. GOVERN โ€” Policies, Processes, and Procedures
  4. MAP โ€” Context and Risk Identification
  5. MEASURE โ€” Assessment, Analysis, and Tracking
  6. MANAGE โ€” Risk Response and Monitoring
  7. Coverage Summary Matrix
  8. Gap Analysis and Recommended Actions
  9. Cross-References to Other Compliance Frameworks

1. Executive Summary

The Agent Governance Toolkit (AGT) is an open-source, multi-language governance framework for AI agent systems. This document provides a systematic alignment assessment of AGT against all 19 subcategories of the NIST AI Risk Management Framework (AI RMF 1.0), covering the four core functions: GOVERN, MAP, MEASURE, and MANAGE.

Scorecard

Metric Value
Total subcategories assessed 19
Fully Addressed 12 (63%)
Partially Addressed 7 (37%)
Gaps (Not Addressed) 0 (0%)
Strongest areas GOVERN 1 (Policy), MANAGE 1 (Risk Response), MANAGE 4 (Monitoring)
Areas for improvement MAP 5 (Individual Impacts), MEASURE 4 (Measurement Feedback), MANAGE 2 (Benefit Maximization)

AGT demonstrates strong-to-excellent coverage across all four RMF functions. The toolkit's strongest capabilities lie in policy infrastructure (10+ PolicyEngine implementations across Python, .NET, and TypeScript), risk response mechanisms (circuit breakers, kill switches, saga compensation), and deep observability (OpenTelemetry, fleet monitoring, rogue agent detection). The primary improvement opportunities are in bias/fairness evaluation, compliance trend analysis, and formal benefit-maximization framing.


2. Methodology

This assessment maps AGT capabilities to each of the 19 NIST AI RMF subcategories using the following evidence types:

  • Code artifacts โ€” Source files, classes, functions, and configuration schemas
  • Documentation โ€” Architecture docs, threat models, and compliance mappings
  • Benchmarks โ€” Performance measurements quantifying governance overhead
  • Templates โ€” Policy-as-code YAML templates for common regulatory patterns

Coverage levels are assigned as:

Level Criteria
โœ… Fully Addressed Subcategory requirements are met by production-ready code with tests and documentation
โš ๏ธ Partially Addressed Core capabilities exist but with documented gaps or limitations
โŒ Gap No code or documentation addresses this subcategory

3. GOVERN โ€” Policies, Processes, and Procedures

GOVERN 1: Policies Reflecting Risk Management Are in Place

Coverage: โœ… FULLY ADDRESSED

AGT implements a multi-layered, declarative policy system with schema validation, versioning, conflict resolution, and multiple backend support.

Component File Key Class/Function
Core policy evaluator agent-governance-python/agent-os/src/agent_os/policies/evaluator.py PolicyEvaluator
Async policy evaluator agent-governance-python/agent-os/src/agent_os/policies/async_evaluator.py AsyncPolicyEvaluator
Shared/cross-project policies agent-governance-python/agent-os/src/agent_os/policies/shared.py SharedPolicyEvaluator
AgentMesh policy engine agent-governance-python/agent-mesh/src/agentmesh/governance/policy.py:317 PolicyEngine
AgentMesh policy evaluator agent-governance-python/agent-mesh/src/agentmesh/governance/policy_evaluator.py:33 PolicyEvaluator
.NET policy engine agent-governance-dotnet/src/AgentGovernance/Policy/PolicyEngine.cs:16 PolicyEngine
TypeScript MCP policy engine agent-governance-python/agent-os/extensions/mcp-server/src/services/policy-engine.ts:208 PolicyEngine
VS Code policy engine agent-governance-typescript/agent-os-vscode/src/policyEngine.ts:51 PolicyEngine
Contextual policy engine agent-governance-python/agent-os/src/agent_os/execution_context_policy.py:62 ContextualPolicyEngine
Semantic policy engine agent-governance-python/agent-os/src/agent_os/semantic_policy.py:248 SemanticPolicyEngine
IATP policy engine agent-governance-python/agent-os/modules/iatp/iatp/policy_engine.py:78 IATPPolicyEngine
Control-plane policy engine agent-governance-python/agent-os/modules/control-plane/src/agent_control_plane/policy_engine.py:178 PolicyEngine
Conflict resolution agent-governance-python/agent-os/src/agent_os/policies/conflict_resolution.py ResolutionResult
Policy schema (JSON) agent-governance-python/agent-os/src/agent_os/policies/policy_schema.json JSON Schema
OPA integration agent-governance-python/agent-mesh/src/agentmesh/governance/opa.py OPA/Rego backend
Cedar integration agent-governance-python/agent-mesh/src/agentmesh/governance/cedar.py Cedar backend
Policy templates agent-governance-python/agent-os/templates/policies/*.yaml GDPR, production, enterprise, data-protection, content-safety

How AGT addresses this subcategory: Policy-as-code with YAML templates supports declarative governance across environments. Multiple backend engines (native, OPA Rego, Cedar) allow organizations to use existing policy infrastructure. Schema validation, versioning (PolicyVersion), diff tracking, and conflict detection provide lifecycle management. Three enforcement modes (strict, permissive, audit) enable progressive policy rollout.

Gaps: None identified.


GOVERN 2: Accountability Structures Are in Place

Coverage: โœ… FULLY ADDRESSED

AGT provides cryptographic audit trails, Merkle hash chains, Shapley-value fault attribution, and joint liability tracking.

Component File Key Class/Function
Merkle audit chain agent-governance-python/agent-mesh/src/agentmesh/governance/audit.py:153 MerkleAuditChain
Flight recorder (control-plane) agent-governance-python/agent-os/modules/control-plane/src/agent_control_plane/flight_recorder.py:33 FlightRecorder
Flight recorder (IATP) agent-governance-python/agent-os/modules/iatp/iatp/telemetry/__init__.py:21 FlightRecorder
Flight recorder (Lightning) agent-governance-python/agent-lightning/src/agent_lightning_gov/emitter.py:56 FlightRecorderEmitter
Hypervisor audit agent-governance-python/agent-hypervisor/audit/delta.py DeltaEngine
Shapley attribution agent-governance-python/agent-hypervisor/src/hypervisor/liability/attribution.py Shapley-value fault attribution
Joint liability agent-governance-python/agent-hypervisor/src/hypervisor/liability/__init__.py Joint liability module
Liability ledger agent-governance-python/agent-hypervisor/src/hypervisor/liability/ledger.py Liability tracking
Quarantine system agent-governance-python/agent-hypervisor/src/hypervisor/liability/quarantine.py Agent quarantine
RBAC agent-governance-python/agent-os/src/agent_os/integrations/rbac.py 4 roles: READER, WRITER, ADMIN, AUDITOR
DID-based attribution agent-governance-python/agent-mesh/src/agentmesh/governance/audit.py agent_did field per entry

How AGT addresses this subcategory: Merkle hash chains provide tamper-evident audit trails where each entry is cryptographically linked to its predecessor. Shapley-value attribution enables mathematical fault attribution across multi-agent systems โ€” a capability rare in governance toolkits. RBAC with four predefined roles (READER, WRITER, ADMIN, AUDITOR) enforces least-privilege access. DID-based agent identity ensures every action is traceable to a specific agent.

Gaps: None identified.


GOVERN 3: Workforce Diversity and Expertise

Coverage: โš ๏ธ PARTIALLY ADDRESSED

AGT has community governance documentation but no code-level enforcement of diversity, expertise requirements, or contributor roles.

Component File Notes
Contributing guide CONTRIBUTING.md Contribution process, DCO, PR workflow
Code of conduct CODE_OF_CONDUCT.md Microsoft Open Source Code of Conduct
Community guide COMMUNITY.md Community structure, communication channels
Security policy SECURITY.md Vulnerability reporting process

How AGT addresses this subcategory: Community documentation establishes contribution norms, inclusive conduct standards, and security reporting processes. The Microsoft Open Source Code of Conduct provides an organizational commitment to diversity and inclusion.

Gaps: No machine-readable role definitions, no expertise verification mechanisms, no diversity tracking. This is primarily an organizational obligation typically outside the scope of a governance toolkit.


GOVERN 4: Organizational Practices with Third-Party Entities

Coverage: โœ… FULLY ADDRESSED

AGT implements comprehensive supply chain security including plugin signing, trust tiers, MCP gateway controls, AI-BOM, and dependency confusion protection.

Component File Key Class/Function
MCP security scanner agent-governance-python/agent-os/src/agent_os/mcp_security.py:324 MCPSecurityScanner
MCP gateway agent-governance-python/agent-os/src/agent_os/mcp_gateway.py:99 MCPGateway
Plugin signing agent-governance-python/agent-marketplace/src/agent_marketplace/signing.py:22 PluginSigner (Ed25519)
Plugin manifest agent-governance-python/agent-marketplace/src/agent_marketplace/manifest.py:36 PluginManifest
MCP trust proxy agent-governance-python/agent-mesh/packages/mcp-proxy/ TypeScript proxy with policy enforcement
Trust tiers agent-governance-python/agent-marketplace/src/agent_marketplace/trust_tiers.py filter_capabilities()
Usage trust scoring agent-governance-python/agent-marketplace/src/agent_marketplace/usage_trust.py:48 UsageTrustScorer
Marketplace policy agent-governance-python/agent-marketplace/src/agent_marketplace/marketplace_policy.py MCPServerPolicy
Egress policy agent-governance-python/agent-os/src/agent_os/egress_policy.py:50 EgressPolicy
AI-BOM agent-governance-python/agent-mesh/docs/RFC_AGENT_SBOM.md AI Bill of Materials v2.0
Federation agent-governance-python/agent-mesh/src/agentmesh/governance/federation.py Cross-org federation

How AGT addresses this subcategory: Ed25519-signed plugins and manifest validation ensure supply chain integrity. The five-tier trust scoring system (0โ€“1000) with filter_capabilities() restricts third-party agents to appropriate privilege levels. MCP gateway allowlist/blocklist controls, security scanning (tool poisoning and injection detection), and egress policies manage third-party data flows. AI-BOM v2.0 provides model provenance, dataset lineage, and weights versioning.

Gaps: None identified.


GOVERN 5: Risk Management Processes Are Defined and Implemented

Coverage: โœ… FULLY ADDRESSED

Component File Key Class/Function
EU AI Act risk classifier agent-governance-python/agent-mesh/src/agentmesh/governance/eu_ai_act.py RiskLevel, RiskClassifier, AgentRiskProfile
Compliance framework agent-governance-python/agent-mesh/src/agentmesh/governance/compliance.py Multi-framework compliance
Control-plane compliance agent-governance-python/agent-os/modules/control-plane/src/agent_control_plane/compliance.py Compliance engine
Rogue agent detector agent-governance-python/agent-sre/src/agent_sre/anomaly/rogue_detector.py:304 RogueAgentDetector

How AGT addresses this subcategory: EU AI Act four-tier risk classification (UNACCEPTABLE, HIGH, LIMITED, MINIMAL) provides structured risk assessment. AgentRiskProfile aggregates risk signals per agent. The compliance engine supports multi-framework verification, allowing organizations to define and enforce risk management processes declaratively.

Gaps: None identified.


GOVERN 6: Policies and Procedures Aligned with Applicable Requirements

Coverage: โœ… FULLY ADDRESSED

AGT maintains dedicated compliance mapping documents for seven major frameworks.

Framework File Status
OWASP Agentic Top 10 docs/OWASP-COMPLIANCE.md 10/10 risks covered
EU AI Act docs/compliance/eu-ai-act-checklist.md 9/11 articles addressed
SOC 2 Type II docs/compliance/soc2-mapping.md 4/5 criteria addressed
ATF Conformance docs/compliance/atf-conformance-assessment.md 25/25 requirements (7 partial)
OWASP LLM Top 10 docs/compliance/owasp-llm-top10-mapping.md Full mapping
NIST RFI (2026) docs/compliance/nist-rfi-2026-00206.md Question-by-question mapping
South Korea AI Framework Act agent-governance-python/agent-compliance/docs/compliance/south-korea-ai-framework-act.md Mapped

How AGT addresses this subcategory: Each compliance document systematically maps AGT capabilities to specific regulatory requirements, identifies gaps, and provides code citations. This document (NIST AI RMF alignment) extends coverage to the eighth framework.

Gaps: None identified.


4. MAP โ€” Context and Risk Identification

MAP 1: Context Is Established

Coverage: โœ… FULLY ADDRESSED

Component File Key Class/Function
Execution context agent-governance-python/agent-os/src/agent_os/execution_context_policy.py:62 ContextualPolicyEngine
Stateless kernel context agent-governance-python/agent-os/src/agent_os/stateless.py ExecutionContext
Governance tiers agent-governance-python/agent-hypervisor/src/hypervisor/models.py Ring 0โ€“3 privilege separation
Policy modes agent-governance-python/agent-os/src/agent_os/policies/schema.py:34-41 strict, permissive, audit
Context budget agent-governance-python/agent-os/src/agent_os/context_budget.py ContextScheduler

How AGT addresses this subcategory: ContextualPolicyEngine binds policy evaluation to rich execution context including governance tiers, environment type, and operational mode. The four-ring privilege model (Ring 0: kernel through Ring 3: untrusted) establishes operational boundaries for each agent. ContextScheduler manages token budgets and resource allocation within context.

Gaps: None identified.


MAP 2: Categorization of AI Systems

Coverage: โœ… FULLY ADDRESSED

Component File Key Class/Function
EU AI Act risk classifier agent-governance-python/agent-mesh/src/agentmesh/governance/eu_ai_act.py RiskLevel enum
Agent risk profile agent-governance-python/agent-mesh/src/agentmesh/governance/eu_ai_act.py AgentRiskProfile dataclass
Compliance checker example agent-governance-python/agent-mesh/examples/06-eu-ai-act-compliance/compliance_checker.py Demo risk classifier
Trust tiers (5-tier) docs/ARCHITECTURE.md 0โ€“1000 scale: Untrusted โ†’ Verified Partner
Execution rings (4-tier) agent-governance-python/agent-hypervisor/src/hypervisor/models.py Ring 0 (kernel) โ†’ Ring 3 (untrusted)

How AGT addresses this subcategory: Dual categorization systems โ€” EU AI Act risk levels (UNACCEPTABLE, HIGH, LIMITED, MINIMAL) and the five-tier trust score (0โ€“1000) โ€” enable AI systems to be categorized by both regulatory risk and behavioral trust. The four-ring execution model further segments agents by privilege level.

Gaps: None identified.


MAP 3: Benefits and Costs Assessed

Coverage: โš ๏ธ PARTIALLY ADDRESSED

AGT provides comprehensive performance benchmarks quantifying governance overhead but lacks formal cost-benefit frameworks.

Component File Key Metric
Policy benchmarks BENCHMARKS.md 0.011ms p50 (single rule), 47K ops/sec at 1K agents
Kernel benchmarks agent-governance-python/agent-os/benchmarks/bench_kernel.py 0.103ms p50 full enforcement path
Audit benchmarks agent-governance-python/agent-os/benchmarks/bench_audit.py 2ยตs per audit write
Adapter overhead BENCHMARKS.md 0.005โ€“0.007ms per adapter check
Circuit breaker BENCHMARKS.md 0.0005ms (1.83M ops/sec)
SRE benchmarks agent-governance-python/agent-sre/src/agent_sre/benchmarks/__init__.py SRE-specific benchmarks

How AGT addresses this subcategory: Governance overhead is rigorously quantified in latency and throughput terms. Sub-millisecond policy evaluation and microsecond-level audit writes demonstrate that governance does not impose meaningful performance penalties.

Gaps: No formal ROI model or cost-benefit analysis framework. Overhead is quantified in technical terms (latency/throughput) but not in business value terms (risk reduction, compliance cost savings, incident prevention value).


MAP 4: Risks and Impacts Identified

Coverage: โœ… FULLY ADDRESSED

Component File Key Content
STRIDE threat model docs/THREAT_MODEL.md 4 trust boundaries, 6 attack surfaces, STRIDE analysis
OWASP Agentic Top 10 docs/OWASP-COMPLIANCE.md 10/10 risks mapped with mitigations
Blast radius containment agent-governance-python/agent-hypervisor/src/hypervisor/models.py Ring isolation, Ring 0โ€“3
Cascade detection agent-governance-python/agent-sre/src/agent_sre/cascade/circuit_breaker.py:223 CascadeDetector
Ring breach detection agent-governance-python/agent-hypervisor/rings/breach_detector.py Sliding-window anomaly detection
Prompt injection detector agent-governance-python/agent-os/src/agent_os/prompt_injection.py:357 PromptInjectionDetector (12+ patterns)
Memory guard agent-governance-python/agent-os/src/agent_os/memory_guard.py:170 MemoryGuard โ€” memory poisoning defense
Adversarial evaluator agent-governance-python/agent-sre/src/agent_sre/chaos/adversarial.py Adversarial testing
Chaos testing agent-governance-python/agent-sre/src/agent_sre/chaos/engine.py Chaos engineering library

How AGT addresses this subcategory: STRIDE-based threat modeling systematically identifies risks across four trust boundaries and six attack surfaces. Prompt injection detection (12+ pattern families), memory poisoning defense, and cascade detection provide defense-in-depth. Chaos engineering and adversarial evaluation proactively discover risks before production deployment.

Gaps: None identified.


MAP 5: Impacts to Individuals, Groups, and Communities

Coverage: โš ๏ธ PARTIALLY ADDRESSED

AGT has PII/PHI protection via regex patterns and GDPR policy templates but lacks ML-based bias detection or fairness evaluation.

Component File Key Class/Function
GDPR policy template agent-governance-python/agent-os/templates/policies/gdpr.yaml 10+ PII pattern categories, right to erasure, data minimization
Data protection template agent-governance-python/agent-os/templates/policies/data-protection.yaml Data protection rules
PII detection policy agent-governance-python/agent-os/examples/shared-policies/no-pii.yaml Shareable PII blocking policy
Memory guard PII redaction agent-governance-python/agent-os/src/agent_os/memory_guard.py PII redaction in context
Content governance agent-governance-python/agent-os/src/agent_os/content_governance.py:78 ContentQualityEvaluator
HIPAA example agent-governance-python/agent-os/tutorials/hipaa-compliant-agent/demo.py Healthcare compliance demo
Healthcare HIPAA example agent-governance-python/agent-mesh/examples/03-healthcare-hipaa/main.py PHI protection demo

How AGT addresses this subcategory: GDPR policy templates provide declarative PII protection across 10+ categories with right-to-erasure and data minimization controls. Memory guard actively redacts PII from agent context. HIPAA-compliant agent tutorials demonstrate PHI protection patterns.

Gaps: - No ML-based NER (e.g., Presidio) for PII/PHI โ€” regex-only detection - No bias detection algorithms or fairness metrics - No demographic parity or equalized odds evaluation - No consent management system - No Data Subject Access Request (DSAR) workflow automation


5. MEASURE โ€” Assessment, Analysis, and Tracking

MEASURE 1: Metrics Identified and Applied

Coverage: โœ… FULLY ADDRESSED

Component File Key Class/Function
SLO engine agent-governance-python/agent-sre/src/agent_sre/slo/objectives.py:167 SLO, ErrorBudget, SLOStatus
SLO spec agent-governance-python/agent-sre/src/agent_sre/slo/spec.py:51 SLOSpec, ErrorBudgetPolicy
SLO dashboard agent-governance-python/agent-sre/src/agent_sre/slo/dashboard.py:73 SLODashboard, SLOSnapshot
SLO validator agent-governance-python/agent-sre/src/agent_sre/slo/validator.py:33 SLODiff
.NET SLO engine agent-governance-dotnet/src/AgentGovernance/Sre/SloEngine.cs ErrorBudgetPolicy, ErrorBudgetTracker
SLO VS Code panel agent-governance-typescript/agent-os-vscode/src/views/sloDashboardView.ts:38 SLODashboardProvider
Trust score (AgentMesh) agent-governance-python/agent-mesh/src/agentmesh/governance/ 0โ€“1000 scale, 5 tiers
Shift-left metrics agent-governance-python/agent-os/src/agent_os/shift_left_metrics.py ShiftLeftTracker, ViolationStage, ViolationRecord
Usage trust scorer agent-governance-python/agent-marketplace/src/agent_marketplace/usage_trust.py:48 UsageTrustScorer
OTel metrics agent-governance-python/agent-sre/src/agent_sre/integrations/otel/metrics.py OpenTelemetry metrics export
MCP metrics agent-governance-python/agent-os/src/agent_os/_mcp_metrics.py MCP-specific metrics
Langfuse SLO scores agent-governance-python/agent-sre/src/agent_sre/integrations/langfuse/exporter.py:56 SLOScore

How AGT addresses this subcategory: SLI/SLO/error budget engine provides structured quantitative metrics with dashboard visualization. Trust scoring (0โ€“1000, five tiers) quantifies agent trustworthiness. Shift-left metrics track governance violations by lifecycle stage (pre-commit, PR, CI, runtime). OpenTelemetry integration exports metrics to industry-standard observability platforms.

Gaps: None identified.


MEASURE 2: AI Systems Evaluated

Coverage: โš ๏ธ PARTIALLY ADDRESSED

Component File Key Class/Function
Content quality evaluator agent-governance-python/agent-os/src/agent_os/content_governance.py:78 ContentQualityEvaluator
Plugin quality assessor agent-governance-python/agent-marketplace/src/agent_marketplace/quality_assessment.py:120 QualityAssessor
Red team dataset agent-governance-python/agent-os/modules/control-plane/benchmark/red_team_dataset.py Red-team benchmark data
Policy benchmark suite agent-governance-python/agent-os/benchmarks/bench_policy.py 30-scenario OWASP benchmark
CMVK verification agent-governance-python/agent-os/modules/cmvk/src/cmvk/constitutional.py Cross-Model Verification Kernel

How AGT addresses this subcategory: Content quality evaluation and plugin quality assessment provide governance-level evaluation. Red-team datasets and 30-scenario OWASP benchmarks test governance enforcement under adversarial conditions. The Cross-Model Verification Kernel (CMVK) enables constitutional AI checks across models.

Gaps: No formal model accuracy or correctness evaluation pipeline. Quality assessment focuses on governance and content safety rather than model performance metrics (e.g., accuracy, calibration, hallucination rate).


MEASURE 3: Mechanisms for Tracking Identified AI Risks

Coverage: โœ… FULLY ADDRESSED

Component File Key Class/Function
Behavioral baseline agent-governance-python/agent-sre/src/agent_sre/anomaly/detector.py:68 BehaviorBaseline
Rogue agent detector agent-governance-python/agent-sre/src/agent_sre/anomaly/rogue_detector.py:304 RogueAgentDetector
Drift detector (Agent OS) agent-governance-python/agent-os/src/agent_os/integrations/drift_detector.py:93 DriftDetector, DriftType enum
MCP drift detector (SRE) agent-governance-python/agent-sre/src/agent_sre/integrations/mcp/__init__.py:169 DriftDetector
Flight recorder (control-plane) agent-governance-python/agent-os/modules/control-plane/src/agent_control_plane/flight_recorder.py:33 FlightRecorder
Ring breach detection agent-governance-python/agent-hypervisor/rings/breach_detector.py Sliding-window anomaly detection
Fleet monitoring agent-governance-python/agent-sre/src/agent_sre/fleet/__init__.py Fleet-wide health with AgentState.DEGRADED

How AGT addresses this subcategory: Behavioral baselines establish normal operating patterns per agent. Drift detectors identify deviations from expected behavior. The rogue agent detector classifies agents exhibiting anomalous patterns. Flight recorders provide forensic-grade telemetry for post-incident analysis. Fleet monitoring aggregates health across agent populations.

Limitation: Behavioral baselines are in-memory only โ€” no durable cross-session persistence. Baselines are lost when agent sessions terminate.


MEASURE 4: Feedback About Efficacy of Measurement

Coverage: โš ๏ธ PARTIALLY ADDRESSED

Component File Key Class/Function
Shift-left tracker agent-governance-python/agent-os/src/agent_os/shift_left_metrics.py ShiftLeftTracker โ€” violations by lifecycle stage
SLO dashboard agent-governance-python/agent-sre/src/agent_sre/slo/dashboard.py:73 SLODashboard snapshots
VS Code SLO panel agent-governance-typescript/agent-os-vscode/src/webviews/sidebar/panels/SLOSummary.tsx Real-time SLO summary
OTel governance export agent-governance-python/agent-mesh/src/agentmesh/observability/otel_governance.py Governance telemetry
Langfuse exporter agent-governance-python/agent-sre/src/agent_sre/integrations/langfuse/exporter.py SLO scores to Langfuse
OpenLit integration agent-governance-python/agent-sre/src/agent_sre/integrations/openlit.py OpenLit observability

How AGT addresses this subcategory: Shift-left metrics track violations by lifecycle stage (pre-commit, PR, CI, runtime), enabling measurement of where governance catches issues. SLO dashboards provide point-in-time compliance snapshots. Integration with Langfuse and OpenLit enables external measurement platforms.

Gaps: No time-series compliance trend analysis, no measurement-of-measurement loops, no formal reports on metric effectiveness. The toolkit provides raw measurement capabilities but does not yet evaluate whether those measurements are themselves effective.


6. MANAGE โ€” Risk Response and Monitoring

MANAGE 1: Risks Prioritized and Responded To

Coverage: โœ… FULLY ADDRESSED

Component File Key Class/Function
Circuit breaker (SRE) agent-governance-python/agent-sre/src/agent_sre/cascade/circuit_breaker.py:90 CircuitBreaker (trip/open/half-open)
Circuit breaker (incidents) agent-governance-python/agent-sre/src/agent_sre/incidents/circuit_breaker.py:59 CircuitBreaker, CircuitBreakerRegistry
Circuit breaker (Agent OS) agent-governance-python/agent-os/src/agent_os/_circuit_breaker_impl.py:82 CircuitBreaker, CascadeDetector
.NET circuit breaker agent-governance-dotnet/src/AgentGovernance/Sre/CircuitBreaker.cs:62 CircuitBreaker
Kill switch agent-governance-python/agent-hypervisor/src/hypervisor/security/kill_switch.py:69 KillSwitch.kill() โ€” 6 kill reasons
Rate limiter (hypervisor) agent-governance-python/agent-hypervisor/src/hypervisor/security/rate_limiter.py:86 AgentRateLimiter
Rate limiter (Agent Mesh) agent-governance-python/agent-mesh/src/agentmesh/services/rate_limiter.py:93 RateLimiter
Rate limiter (MCP sliding) agent-governance-python/agent-os/src/agent_os/mcp_sliding_rate_limiter.py:17 MCPSlidingRateLimiter
Rate limiter (TypeScript) agent-governance-python/agent-mesh/packages/mcp-proxy/src/rate-limiter.ts:19 RateLimiter
.NET rate limiter agent-governance-dotnet/src/AgentGovernance/RateLimiting/RateLimiter.cs:11 RateLimiter
Approval workflow agent-governance-python/agent-os/extensions/mcp-server/src/services/approval-workflow.ts:18 ApprovalWorkflow โ€” quorum, expiration
Saga orchestrator agent-governance-python/agent-hypervisor/saga/orchestrator.py SagaOrchestrator โ€” rollback compensation
Reversibility registry agent-governance-python/agent-hypervisor/reversibility/registry.py Undo/rollback registry

How AGT addresses this subcategory: Multi-tier risk response: circuit breakers (with trip/open/half-open state machine) prevent cascade failures; kill switches provide immediate agent termination for six enumerated risk categories; rate limiters (sliding window, token bucket) control throughput across all language packages. Approval workflows with quorum requirements add human oversight. Saga orchestrators enable compensating transactions to roll back multi-step operations upon failure.

Gaps: None identified.


MANAGE 2: Strategies to Maximize AI Benefits

Coverage: โš ๏ธ PARTIALLY ADDRESSED

Component File Key Class/Function
Trust scoring (0โ€“1000) agent-governance-python/agent-mesh/src/agentmesh/governance/ 5 tiers: Untrusted โ†’ Verified Partner
Trust decay agent-governance-python/agent-mesh/ Scores degrade without positive signals
Capability delegation agent-governance-python/agent-mesh/identity/agent_id.py delegate(), capability narrowing
Graduated rings agent-governance-python/agent-hypervisor/src/hypervisor/models.py Ring 0โ€“3 privilege escalation/demotion
Ring demotion agent-governance-python/agent-hypervisor/session/__init__.py update_ring()
Trust-tier filtering agent-governance-python/agent-marketplace/src/agent_marketplace/trust_tiers.py filter_capabilities()
Progressive delivery agent-governance-python/agent-sre/src/agent_sre/delivery/ Canary deploys, GitOps
NoOp fallbacks agent-governance-python/agent-os/src/agent_os/compat.py:37 NoOpPolicyEvaluator
RL training governance agent-governance-python/agent-lightning/ Policy rewards for RL training

How AGT addresses this subcategory: Trust-based capability delegation (child โ‰ค parent) ensures agents earn expanded privileges through demonstrated trustworthy behavior. Progressive delivery (canary deploys) minimizes risk when introducing governance changes. Trust decay ensures agents maintain good behavior to retain capabilities.

Gaps: No formal "benefit maximization" framework. Trust-based capability delegation exists but is framed as security controls rather than benefit optimization. No documented strategy for balancing governance overhead against agent utility.


MANAGE 3: Risks from Third-Party Entities Managed

Coverage: โœ… FULLY ADDRESSED

Component File Key Class/Function
MCP security scanner agent-governance-python/agent-os/src/agent_os/mcp_security.py:324 MCPSecurityScanner โ€” tool poisoning, injection detection
MCP gateway agent-governance-python/agent-os/src/agent_os/mcp_gateway.py:99 MCPGateway โ€” allowlist/blocklist
MCP trust proxy agent-governance-python/agent-mesh/packages/mcp-proxy/ TypeScript proxy with policy enforcement
Plugin signing agent-governance-python/agent-marketplace/src/agent_marketplace/signing.py:22 PluginSigner โ€” Ed25519
Plugin manifest validation agent-governance-python/agent-marketplace/src/agent_marketplace/manifest.py:36 PluginManifest โ€” Pydantic validation
Marketplace policy agent-governance-python/agent-marketplace/src/agent_marketplace/marketplace_policy.py MCPServerPolicy, org-level policies
Trust tiers agent-governance-python/agent-marketplace/src/agent_marketplace/trust_tiers.py Plugin trust tier filtering
AI-BOM v2.0 agent-governance-python/agent-mesh/docs/RFC_AGENT_SBOM.md Model provenance, dataset lineage
Egress policy agent-governance-python/agent-os/src/agent_os/egress_policy.py:50 EgressPolicy โ€” domain allow/deny
Schema adapters agent-governance-python/agent-marketplace/src/agent_marketplace/schema_adapters.py Copilot/Claude manifest normalization

How AGT addresses this subcategory: Defense-in-depth for third-party risks: MCP security scanner detects tool poisoning and injection; gateway enforces allowlist/blocklist policies; plugin signing (Ed25519) and manifest validation prevent supply chain attacks. AI-BOM v2.0 tracks model provenance and dataset lineage. Egress policies control outbound data flows to authorized domains only.

Gaps: None identified.


MANAGE 4: Risks Monitored

Coverage: โœ… FULLY ADDRESSED

Component File Key Class/Function
Rogue agent detector agent-governance-python/agent-sre/src/agent_sre/anomaly/rogue_detector.py:304 RogueAgentDetector โ€” scoring, classification
Fleet monitoring agent-governance-python/agent-sre/src/agent_sre/fleet/__init__.py Fleet-wide health, AgentState enum
OTel tracing (SRE) agent-governance-python/agent-sre/src/agent_sre/tracing/spans.py Distributed tracing spans
OTel metrics (SRE) agent-governance-python/agent-sre/src/agent_sre/tracing/metrics.py Metrics instrumentation
OTel exporters agent-governance-python/agent-sre/src/agent_sre/tracing/exporters.py OTLP/Jaeger/Zipkin exporters
OTel governance SDK agent-governance-python/agent-mesh/src/agentmesh/observability/otel_sdk.py Governance-aware OTel
OTel governance enrichment agent-governance-python/agent-mesh/src/agentmesh/observability/otel_governance.py Policy events as OTel spans
OTel saga sink agent-governance-python/agent-sre/src/agent_sre/integrations/otel/saga_sink.py Saga lifecycle as OTel spans
OTel events agent-governance-python/agent-sre/src/agent_sre/integrations/otel/events.py Governance event export
OpenLit integration agent-governance-python/agent-sre/src/agent_sre/integrations/openlit.py OpenLit observability
Agent OS observability agent-governance-python/agent-os/modules/observability/src/agent_os_observability/tracer.py Agent OS tracing
Hypervisor event bus agent-governance-python/agent-hypervisor/src/hypervisor/observability/event_bus.py Internal event bus
Cascade detector agent-governance-python/agent-sre/src/agent_sre/cascade/circuit_breaker.py:223 CascadeDetector

How AGT addresses this subcategory: Deep observability stack: OpenTelemetry integration across all packages (spans, metrics, events) exports to OTLP/Jaeger/Zipkin. Rogue agent detector uses behavioral scoring to classify anomalous agents. Fleet monitoring provides population-level health dashboards. Governance-enriched OTel spans embed policy evaluation results directly into distributed traces, enabling governance-aware debugging.

Gaps: None identified.


7. Coverage Summary Matrix

# Subcategory Coverage Evidence Strength Key Artifacts
1 GOVERN 1 โ€” Policies โœ… Full Strong 10+ PolicyEngine implementations, OPA/Cedar backends
2 GOVERN 2 โ€” Accountability โœ… Full Strong Merkle audit, Shapley attribution, RBAC, DID
3 GOVERN 3 โ€” Workforce โš ๏ธ Partial Moderate CONTRIBUTING.md, CODE_OF_CONDUCT.md
4 GOVERN 4 โ€” Third-party practices โœ… Full Strong Plugin signing, MCP scanner, AI-BOM, egress policy
5 GOVERN 5 โ€” Risk processes โœ… Full Strong EU AI Act classifier, compliance engine
6 GOVERN 6 โ€” Requirements alignment โœ… Full Strong 7 framework compliance mappings
7 MAP 1 โ€” Context โœ… Full Strong ExecutionContext, 4-ring model, 3 policy modes
8 MAP 2 โ€” Categorization โœ… Full Strong RiskLevel enum, AgentRiskProfile, 5-tier trust
9 MAP 3 โ€” Benefits/costs โš ๏ธ Partial Moderate Latency/throughput benchmarks; no ROI model
10 MAP 4 โ€” Risks identified โœ… Full Strong STRIDE threat model, OWASP 10/10, chaos testing
11 MAP 5 โ€” Individual impacts โš ๏ธ Partial Moderate GDPR template, PII regex; no bias/fairness
12 MEASURE 1 โ€” Metrics โœ… Full Strong SLO engine, trust scoring, shift-left, OTel
13 MEASURE 2 โ€” Evaluation โš ๏ธ Partial Moderate Content quality, red team; no model eval pipeline
14 MEASURE 3 โ€” Risk tracking โœ… Full Strong Drift detection, baselines, flight recorder
15 MEASURE 4 โ€” Measurement feedback โš ๏ธ Partial Moderate Shift-left tracker, SLO dashboard
16 MANAGE 1 โ€” Risk response โœ… Full Strong Circuit breakers, kill switch, rate limiters, sagas
17 MANAGE 2 โ€” Maximize benefits โš ๏ธ Partial Moderate Trust scoring, graduated autonomy
18 MANAGE 3 โ€” Third-party risks โœ… Full Strong MCP scanner, plugin signing, trust tiers, AI-BOM
19 MANAGE 4 โ€” Monitoring โœ… Full Strong OTel, rogue detector, fleet monitoring, cascade

Totals: 12 Fully Addressed ยท 7 Partially Addressed ยท 0 Gaps


Priority 1 โ€” HIGH

Gap Subcategory Current State Recommended Action
No bias/fairness evaluation MAP 5 Regex-only PII detection; no algorithmic bias testing Integrate ML-based NER (e.g., Presidio); add FairnessEvaluator with demographic parity and equalized odds metrics
No consent/DSAR management MAP 5 GDPR template has data minimization but no consent workflow Implement consent management and DSAR automation in agent-compliance

Priority 2 โ€” MEDIUM

Gap Subcategory Current State Recommended Action
No compliance trend analysis MEASURE 4 Point-in-time SLO snapshots only Add ComplianceTrendAnalyzer to aggregate shift-left and SLO data over time; expose via SRE dashboard API
No model evaluation pipeline MEASURE 2 Content/plugin quality only Add ModelEvaluator module or LM Harness/HELM integration for accuracy/calibration benchmarks
No benefit-maximization framing MANAGE 2 Trust delegation framed as security Document governance ROI; reframe trust scoring as benefit optimization with measurable utility metrics
In-memory behavioral baselines MEASURE 3 Baselines lost on session end Add BaselinePersistence backend (SQLite or file-backed) to agent-governance-python/agent-sre/anomaly/

Priority 3 โ€” LOW

Gap Subcategory Current State Recommended Action
No ROI/cost-benefit model MAP 3 Technical benchmarks only Add "Governance ROI" analysis to BENCHMARKS.md framing overhead in business value terms
No workforce role enforcement GOVERN 3 Documentation only Consider machine-readable contributor role definitions (organizational scope)

9. Cross-References to Other Compliance Frameworks

This alignment assessment complements and cross-references the following AGT compliance documents. Subcategory mappings below show where NIST AI RMF requirements overlap with other frameworks.

NIST AI RMF Subcategory ATF Reference OWASP Reference EU AI Act Reference SOC 2 Reference
GOVERN 1 (Policies) A-1, A-2 (Policy definition & enforcement) โ€” Art. 9 (Risk management system) CC6.1 (Logical access)
GOVERN 2 (Accountability) A-5 (Audit trails) โ€” Art. 12 (Record-keeping) CC4.1 (Monitoring)
GOVERN 3 (Workforce) โ€” โ€” Art. 14 (Human oversight) โ€”
GOVERN 4 (Third-party) D-1 through D-5 (Supply chain) A-05 (Insecure Plugin Design) Art. 28 (Obligations of deployers) CC9.2 (Vendor mgmt)
GOVERN 5 (Risk processes) A-3 (Risk assessment) โ€” Art. 9 (Risk management system) CC3.2 (Risk assessment)
GOVERN 6 (Requirements) All sections All risks All articles All criteria
MAP 1 (Context) B-1 (Execution boundaries) โ€” Art. 9.2 (Intended purpose) โ€”
MAP 2 (Categorization) A-3 (Risk classification) โ€” Art. 6 (Classification rules) โ€”
MAP 3 (Benefits/costs) โ€” โ€” Art. 9.4 (Cost proportionality) โ€”
MAP 4 (Risks identified) B-2, B-3 (Threat analysis) A-01 through A-10 (All risks) Art. 9.2 (Risk identification) CC3.2 (Risk assessment)
MAP 5 (Individual impacts) C-1, C-2 (Data protection) A-08 (Excessive Agency) Art. 10 (Data governance) P1โ€“P8 (Privacy criteria)
MEASURE 1 (Metrics) E-1 (SLI/SLO) โ€” Art. 9.7 (Testing/metrics) CC4.1 (Monitoring)
MEASURE 2 (Evaluation) E-2 (Quality assessment) โ€” Art. 9.5 (Testing) CC7.1 (System monitoring)
MEASURE 3 (Risk tracking) B-3 (Behavioral baseline) A-03 (Excessive Agency) Art. 9.8 (Risk monitoring) CC7.2 (Change monitoring)
MEASURE 4 (Feedback) E-3 (Continuous improvement) โ€” Art. 9.9 (Documentation updates) CC4.2 (Deficiency mgmt)
MANAGE 1 (Risk response) F-1, F-2 (Circuit breakers, kill switch) A-06 (Excessive Agency) Art. 14 (Human oversight) CC7.3 (Change mgmt)
MANAGE 2 (Maximize benefits) โ€” โ€” Recital 4 (Innovation balance) โ€”
MANAGE 3 (Third-party risks) D-1 through D-5 (Supply chain) A-05 (Insecure Plugin Design) Art. 28 (Deployer obligations) CC9.2 (Vendor mgmt)
MANAGE 4 (Monitoring) E-1, F-3 (Observability) A-09 (Overreliance) Art. 72 (Post-market monitoring) CC7.1 (System monitoring)

This document was prepared for submission to the National Institute of Standards and Technology (NIST) in response to the AI Risk Management Framework (AI RMF 1.0) alignment assessment process. It reflects the state of the Agent Governance Toolkit as of 2026-07-14. For questions or clarifications, please refer to the project's SUPPORT.md or open an issue on GitHub.