Agent Governance Toolkit Threat Model¶

This document summarizes the security threat model for the Agent Governance Toolkit (AGT) using a STRIDE-oriented view of the main trust boundaries in the system.

For the current OWASP Agentic Top 10 mapping across all ASI risk categories, see docs/compliance/owasp-agentic-top10-architecture.md.

Scope¶

This threat model focuses on the runtime governance layer described in the repository README:

Agent OS: deterministic policy enforcement, approvals, MCP governance, context and policy controls
AgentMesh: identity, trust scoring, delegated trust, and inter-agent communication
Agent Runtime: execution rings, kill switch, sandbox boundaries, and saga controls
Agent SRE: circuit breakers, replay, error budgets, and cascade detection

Trust Boundaries¶

1. Human -> Agent¶

Users, operators, or reviewers provide prompts, approvals, policies, and configuration. This is the main entry point for prompt injection, social engineering, and unsafe approvals.

2. Agent -> Agent¶

Agents exchange requests, credentials, handoff context, and trust assertions. This boundary is vulnerable to spoofed identities, tampered trust signals, and over-broad delegation.

3. Agent -> Tool¶

Agents call MCP tools, file operations, shell commands, APIs, plugins, and external services. This is the highest-risk execution boundary because a successful bypass can lead to code execution, data exfiltration, or destructive side effects.

4. Agent -> Platform Control Plane¶

Agents and services interact with package registries, CI/CD, release pipelines, audit systems, and deployment targets. This boundary matters for supply chain, artifact provenance, and operational integrity.

High-Level Data Flow¶

Human / Operator
    |
    v
Agent OS policy + approval checks
    |
    +--> AgentMesh identity / trust validation
    |
    +--> Agent Runtime execution boundary
    |
    +--> Agent SRE monitoring / replay / rollback
    |
    v
Tools, plugins, APIs, storage, and external services

Primary Attack Surfaces¶

Surface	Example threats
Prompts, retrieved context, memory	prompt injection, poisoned context, hidden instructions
Agent identity and delegation	spoofing, replay, forged credentials, trust laundering
Tool calls and plugins	code execution, shell abuse, dangerous file writes, privilege escalation
Policies and config files	unsafe defaults, policy drift, malformed policy documents
Audit and observability	log tampering, trace gaps, incomplete attribution
CI/CD and package publishing	supply chain tampering, unsigned artifacts, metadata confusion

STRIDE Analysis¶

STRIDE category	Example risk in AGT	Primary mitigations
Spoofing	Malicious agent impersonates a trusted peer	AgentMesh Ed25519 identity, DID-style identities, challenge-response handshakes, trust scoring
Tampering	Policies, audit logs, or artifacts are altered in transit or at rest	Agent OS policy interception, signed attestations, Merkle/hash-chain audit trails, ESRP-oriented publishing controls
Repudiation	A user or agent denies having taken a high-risk action	Immutable audit trail, replay tooling, trust and approval metadata, SRE event correlation
Information Disclosure	Agent leaks secrets, PII, or internal context through tools or messages	Capability scoping, MCP governance, VFS-style access control, prompt/content sanitization, least-privilege runtime boundaries
Denial of Service	Cascading failures, expensive loops, or runaway agents	Agent SRE circuit breakers, error budgets, runtime kill switch, bounded execution rings, rate and token controls
Elevation of Privilege	Agent escapes its intended scope or performs unauthorized actions	Agent Runtime rings, Agent OS allow/deny rules, approval workflows, trust decay, constrained delegation

Threats and Mitigations by Package¶

Agent OS¶

Main threats¶

Prompt injection or goal hijack causes unsafe tool execution
Agents call tools outside their approved scope
Policies are too weak, too broad, or bypassed through aliases or malformed requests
Hidden context or memory poisons future decisions

Mitigations¶

Deterministic policy evaluation before action execution
Capability allowlists / denylists and action interception
Approval workflows for sensitive actions
Prompt, tool-input, and context sanitization
Read-only policy and context controls for critical data paths

AgentMesh¶

Main threats¶

Untrusted agents spoof trusted ones
Delegation chains become too broad or unverifiable
Inter-agent messages are replayed, forged, or accepted without validation
Supply chain metadata about models, tools, or registries becomes untrustworthy

Mitigations¶

Ed25519-backed identity and DID-style agent credentials
Trust scoring, trust decay, and revocation
Challenge-response handshake and signed trust attestations
AI-BOM / provenance tracking for models, data, and packages

Agent Runtime¶

Main threats¶

Tool execution leads to code execution or destructive side effects
Long-running sessions escape intended isolation
Compromised agents persist after unsafe behavior
Multi-step workflows leave partial state after failure

Mitigations¶

Ring-based execution isolation
Kill switch and termination controls
Saga orchestration / compensation for partial failures
Sandboxed runtime boundaries and auditable execution paths

Agent SRE¶

Main threats¶

One compromised or degraded agent causes cascading failures elsewhere
Operators lack enough telemetry to understand or contain incidents
Slow drift or anomalous behavior goes unnoticed

Mitigations¶

Circuit breakers and rollout controls
Error budgets and SLO-driven enforcement
Replay debugging and event correlation
Anomaly and cascade detection across agent fleets

Threat-to-Control Mapping¶

Threat	Agent OS	AgentMesh	Agent Runtime	Agent SRE
Prompt injection	Policy interception, approval gates	Trusted handoff context	Runtime containment	Replay + anomaly signals
Capability escalation	Policy rules, explicit denies	Scoped trust / delegation	Ring isolation	Detection of unusual call patterns
Identity spoofing	N/A	Signed identity + handshake	Runtime session binding	Cross-service correlation
Data exfiltration	MCP and policy controls	Trust-aware peer gating	Sandboxed execution	Alerting on unusual transfer patterns
Rogue behavior	Policy deny / approval	Trust decay and revocation	Kill switch	Error budgets + cascade detection
Supply chain compromise	Policy and config review	AI-BOM / provenance	Signed artifacts and controlled runtime	Operational change monitoring

Residual Risks¶

AGT reduces risk but does not eliminate it. The main residual risks are:

Misconfigured policies that are syntactically valid but semantically too permissive
Human approvers making unsafe decisions under time pressure
External tools or plugins that behave unsafely inside their allowed scope
Gaps between documented controls and the exact deployment posture of a given organization
Knowledge flow risks: AGT governs tool calls but not the knowledge (documents, embeddings, context) that agents consume and propagate — see Limitations §7
Credential persistence: AGT does not observe or revoke credentials agents hold across tasks within a session — accumulated permissions may exceed what the current task requires — see Limitations §8
Physical AI scope: AGT governs software agents, not physical actuators, hardware interlocks, or real-time control loops — see Limitations §10
Streaming data: AGT evaluates policies per-action, not continuously over data streams — data freshness and quality are not assured — see Limitations §11
DID method inconsistency: Python/.NET use did:mesh:* while TS/Rust/Go use did:agentmesh:* — cross-SDK policy rules must account for both — see Limitations §12

Configuration Bypass Vectors¶

Governance enforcement depends on correct initialization. These configuration states can result in agents running without effective governance:

Bypass Vector	Risk	Mitigation
No policies loaded	Default action is `allow` — all actions pass ungoverned	Always load policy files; use `strict` mode in production
Permissive mode in production	`permissive` mode allows all actions by default	Reserve `permissive` mode for dev/test; enforce `strict` in deployment
Tool aliasing	Registering a tool under an unexpected name bypasses name-based policy rules	Use `strict` mode (deny-by-default) so unrecognized tools are blocked; use regex patterns in policy rules rather than exact tool names
Import-only governance	Importing the governance module without configuring policies creates false "governed" status	Use `agt doctor` and `agt audit` to verify effective enforcement state

These vectors were identified in external red-team analysis by Periculo.

Project Impersonation and Typo-Squatting¶

OSS projects face impersonation risks from third-party websites, packages, or repositories that use the project name to appear official. Common attack vectors:

Vector	Description
Domain squatting	Registering `your-project-name.com/.io/.dev` with cloned README/docs
Package typo-squatting	Publishing `agent-os-kernal` (typo) or `agent_os_kernel` (underscore variant) to PyPI/npm
Repository cloning	Forking the repo, modifying install instructions to point to attacker-hosted binaries
Fake documentation sites	Hosting a lookalike docs site that injects malicious install commands

How AGT Mitigates This¶

AGT's existing components address the root cause: identity should be cryptographic, not name-based.

AGT Component	How It Helps
AgentMesh DID Identity (Tutorial 02)	Agents prove identity with Ed25519 credentials. An impersonator can clone the name but cannot forge the DID.
Ed25519 Artifact Signing (Tutorial 26)	Every release artifact carries a cryptographic signature. Tampered or repackaged artifacts fail verification.
Plugin Marketplace Verification (Tutorial 10)	Plugins are verified against a trusted-key ring before installation. Unsigned or wrongly-signed plugins are rejected.
SBOM Attestation (Tutorial 26)	GitHub attestations bind SBOMs to specific releases, proving provenance through the official build pipeline.
AI-BOM / Provenance Tracking	Supply chain metadata for models, tools, and packages is tracked and verifiable.

Recommended Defenses for All OSS Projects¶

State the official source in README and docs. Add a clear note listing the official GitHub repository, official documentation site, and official package registry URLs. State that the team does not maintain or endorse third-party websites claiming to be official.
Monitor for typo-squatted packages. Periodically search PyPI, npm, and crates.io for packages with names similar to yours (common substitutions: hyphens/underscores, transposed characters, added/dropped suffixes).
Sign release artifacts. Use Ed25519 signing (AGT SDK) or Sigstore so users can verify authenticity before installing.
Use GitHub attestations. Bind build provenance to releases so users can verify artifacts were built by the official CI pipeline.
Register obvious domain variants. If your project is widely used, consider registering the .com/.io/.dev variants of your project name and redirecting to the official repository.
Report impersonation. Use your organization's security reporting channels for takedown requests against impersonating sites or packages.

Recommended Operational Practices¶

Keep policy scope narrow and prefer deny-by-default for high-risk tools
Require explicit approval for destructive, financial, or identity-sensitive actions
Rotate credentials and revoke trust aggressively when behavior changes
Treat release metadata, package publishing, and provenance as part of the runtime security boundary
Use SRE telemetry and replay tooling to investigate suspicious agent actions