Skip to content

Agent Governance Toolkit Threat Model

This document summarizes the security threat model for the Agent Governance Toolkit (AGT) using a STRIDE-oriented view of the main trust boundaries in the system.

For the current OWASP Agentic Top 10 mapping across all ASI risk categories, see docs/compliance/owasp-agentic-top10-architecture.md.

Scope

This threat model focuses on the runtime governance layer described in the repository README:

  • Agent OS: deterministic policy enforcement, approvals, MCP governance, context and policy controls
  • AgentMesh: identity, trust scoring, delegated trust, and inter-agent communication
  • Agent Runtime: execution rings, kill switch, sandbox boundaries, and saga controls
  • Agent SRE: circuit breakers, replay, error budgets, and cascade detection

Trust Boundaries

1. Human -> Agent

Users, operators, or reviewers provide prompts, approvals, policies, and configuration. This is the main entry point for prompt injection, social engineering, and unsafe approvals.

2. Agent -> Agent

Agents exchange requests, credentials, handoff context, and trust assertions. This boundary is vulnerable to spoofed identities, tampered trust signals, and over-broad delegation.

3. Agent -> Tool

Agents call MCP tools, file operations, shell commands, APIs, plugins, and external services. This is the highest-risk execution boundary because a successful bypass can lead to code execution, data exfiltration, or destructive side effects.

4. Agent -> Platform Control Plane

Agents and services interact with package registries, CI/CD, release pipelines, audit systems, and deployment targets. This boundary matters for supply chain, artifact provenance, and operational integrity.

High-Level Data Flow

Human / Operator
    |
    v
Agent OS policy + approval checks
    |
    +--> AgentMesh identity / trust validation
    |
    +--> Agent Runtime execution boundary
    |
    +--> Agent SRE monitoring / replay / rollback
    |
    v
Tools, plugins, APIs, storage, and external services

Primary Attack Surfaces

Surface Example threats
Prompts, retrieved context, memory prompt injection, poisoned context, hidden instructions
Agent identity and delegation spoofing, replay, forged credentials, trust laundering
Tool calls and plugins code execution, shell abuse, dangerous file writes, privilege escalation
Policies and config files unsafe defaults, policy drift, malformed policy documents
Audit and observability log tampering, trace gaps, incomplete attribution
CI/CD and package publishing supply chain tampering, unsigned artifacts, metadata confusion

STRIDE Analysis

STRIDE category Example risk in AGT Primary mitigations
Spoofing Malicious agent impersonates a trusted peer AgentMesh Ed25519 identity, DID-style identities, challenge-response handshakes, trust scoring
Tampering Policies, audit logs, or artifacts are altered in transit or at rest Agent OS policy interception, signed attestations, Merkle/hash-chain audit trails, ESRP-oriented publishing controls
Repudiation A user or agent denies having taken a high-risk action Immutable audit trail, replay tooling, trust and approval metadata, SRE event correlation
Information Disclosure Agent leaks secrets, PII, or internal context through tools or messages Capability scoping, MCP governance, VFS-style access control, prompt/content sanitization, least-privilege runtime boundaries
Denial of Service Cascading failures, expensive loops, or runaway agents Agent SRE circuit breakers, error budgets, runtime kill switch, bounded execution rings, rate and token controls
Elevation of Privilege Agent escapes its intended scope or performs unauthorized actions Agent Runtime rings, Agent OS allow/deny rules, approval workflows, trust decay, constrained delegation

Threats and Mitigations by Package

Agent OS

Main threats

  • Prompt injection or goal hijack causes unsafe tool execution
  • Agents call tools outside their approved scope
  • Policies are too weak, too broad, or bypassed through aliases or malformed requests
  • Hidden context or memory poisons future decisions

Mitigations

  • Deterministic policy evaluation before action execution
  • Capability allowlists / denylists and action interception
  • Approval workflows for sensitive actions
  • Prompt, tool-input, and context sanitization
  • Read-only policy and context controls for critical data paths

AgentMesh

Main threats

  • Untrusted agents spoof trusted ones
  • Delegation chains become too broad or unverifiable
  • Inter-agent messages are replayed, forged, or accepted without validation
  • Supply chain metadata about models, tools, or registries becomes untrustworthy

Mitigations

  • Ed25519-backed identity and DID-style agent credentials
  • Trust scoring, trust decay, and revocation
  • Challenge-response handshake and signed trust attestations
  • AI-BOM / provenance tracking for models, data, and packages

Agent Runtime

Main threats

  • Tool execution leads to code execution or destructive side effects
  • Long-running sessions escape intended isolation
  • Compromised agents persist after unsafe behavior
  • Multi-step workflows leave partial state after failure

Mitigations

  • Ring-based execution isolation
  • Kill switch and termination controls
  • Saga orchestration / compensation for partial failures
  • Sandboxed runtime boundaries and auditable execution paths

Agent SRE

Main threats

  • One compromised or degraded agent causes cascading failures elsewhere
  • Operators lack enough telemetry to understand or contain incidents
  • Slow drift or anomalous behavior goes unnoticed

Mitigations

  • Circuit breakers and rollout controls
  • Error budgets and SLO-driven enforcement
  • Replay debugging and event correlation
  • Anomaly and cascade detection across agent fleets

Threat-to-Control Mapping

Threat Agent OS AgentMesh Agent Runtime Agent SRE
Prompt injection Policy interception, approval gates Trusted handoff context Runtime containment Replay + anomaly signals
Capability escalation Policy rules, explicit denies Scoped trust / delegation Ring isolation Detection of unusual call patterns
Identity spoofing N/A Signed identity + handshake Runtime session binding Cross-service correlation
Data exfiltration MCP and policy controls Trust-aware peer gating Sandboxed execution Alerting on unusual transfer patterns
Rogue behavior Policy deny / approval Trust decay and revocation Kill switch Error budgets + cascade detection
Supply chain compromise Policy and config review AI-BOM / provenance Signed artifacts and controlled runtime Operational change monitoring

Residual Risks

AGT reduces risk but does not eliminate it. The main residual risks are:

  • Misconfigured policies that are syntactically valid but semantically too permissive
  • Human approvers making unsafe decisions under time pressure
  • External tools or plugins that behave unsafely inside their allowed scope
  • Gaps between documented controls and the exact deployment posture of a given organization
  • Knowledge flow risks: AGT governs tool calls but not the knowledge (documents, embeddings, context) that agents consume and propagate — see Limitations §7
  • Credential persistence: AGT does not observe or revoke credentials agents hold across tasks within a session — accumulated permissions may exceed what the current task requires — see Limitations §8
  • Physical AI scope: AGT governs software agents, not physical actuators, hardware interlocks, or real-time control loops — see Limitations §10
  • Streaming data: AGT evaluates policies per-action, not continuously over data streams — data freshness and quality are not assured — see Limitations §11
  • DID method inconsistency: Python/.NET use did:mesh:* while TS/Rust/Go use did:agentmesh:* — cross-SDK policy rules must account for both — see Limitations §12

Configuration Bypass Vectors

Governance enforcement depends on correct initialization. These configuration states can result in agents running without effective governance:

Bypass Vector Risk Mitigation
No policies loaded Default action is allow — all actions pass ungoverned Always load policy files; use strict mode in production
Permissive mode in production permissive mode allows all actions by default Reserve permissive mode for dev/test; enforce strict in deployment
Tool aliasing Registering a tool under an unexpected name bypasses name-based policy rules Use strict mode (deny-by-default) so unrecognized tools are blocked; use regex patterns in policy rules rather than exact tool names
Import-only governance Importing the governance module without configuring policies creates false "governed" status Use agt doctor and agt audit to verify effective enforcement state

These vectors were identified in external red-team analysis by Periculo.

Project Impersonation and Typo-Squatting

OSS projects face impersonation risks from third-party websites, packages, or repositories that use the project name to appear official. Common attack vectors:

Vector Description
Domain squatting Registering your-project-name.com/.io/.dev with cloned README/docs
Package typo-squatting Publishing agent-os-kernal (typo) or agent_os_kernel (underscore variant) to PyPI/npm
Repository cloning Forking the repo, modifying install instructions to point to attacker-hosted binaries
Fake documentation sites Hosting a lookalike docs site that injects malicious install commands

How AGT Mitigates This

AGT's existing components address the root cause: identity should be cryptographic, not name-based.

AGT Component How It Helps
AgentMesh DID Identity (Tutorial 02) Agents prove identity with Ed25519 credentials. An impersonator can clone the name but cannot forge the DID.
Ed25519 Artifact Signing (Tutorial 26) Every release artifact carries a cryptographic signature. Tampered or repackaged artifacts fail verification.
Plugin Marketplace Verification (Tutorial 10) Plugins are verified against a trusted-key ring before installation. Unsigned or wrongly-signed plugins are rejected.
SBOM Attestation (Tutorial 26) GitHub attestations bind SBOMs to specific releases, proving provenance through the official build pipeline.
AI-BOM / Provenance Tracking Supply chain metadata for models, tools, and packages is tracked and verifiable.
  1. State the official source in README and docs. Add a clear note listing the official GitHub repository, official documentation site, and official package registry URLs. State that the team does not maintain or endorse third-party websites claiming to be official.
  2. Monitor for typo-squatted packages. Periodically search PyPI, npm, and crates.io for packages with names similar to yours (common substitutions: hyphens/underscores, transposed characters, added/dropped suffixes).
  3. Sign release artifacts. Use Ed25519 signing (AGT SDK) or Sigstore so users can verify authenticity before installing.
  4. Use GitHub attestations. Bind build provenance to releases so users can verify artifacts were built by the official CI pipeline.
  5. Register obvious domain variants. If your project is widely used, consider registering the .com/.io/.dev variants of your project name and redirecting to the official repository.
  6. Report impersonation. Use your organization's security reporting channels for takedown requests against impersonating sites or packages.
  • Keep policy scope narrow and prefer deny-by-default for high-risk tools
  • Require explicit approval for destructive, financial, or identity-sensitive actions
  • Rotate credentials and revoke trust aggressively when behavior changes
  • Treat release metadata, package publishing, and provenance as part of the runtime security boundary
  • Use SRE telemetry and replay tooling to investigate suspicious agent actions