ADR 0012: Cost Governance via Observability Policies in Agent SRE¶
- Status: accepted
- Date: 2026-05-06
Context¶
AI agents incur real costs through LLM API calls, tool invocations, and external service usage. Organizations need visibility into and control over agent spending, especially in multi-agent orchestration where costs can escalate quickly through cascading calls.
AGT's agent-sre module already tracks cost_per_task as a post-hoc metric alongside SLOs and error budgets. However, there is no policy-driven cost governance: no budget enforcement, no alerts when spending trends upward, and no way to define per-agent or global cost limits.
Industry analysts identify cost governance as a gap in agent security platforms. Our thought leadership work identified "observability policies that understand: this is a paid call, this is an unpaid call" as a key capability.
Design constraints (from architecture review):¶
- Cost metadata should come from all sources (tool annotations, policy mappings, runtime metering), layered
- Budget enforcement should be tiered: soft caps at warning thresholds, hard caps at absolute maximums
- Support both per-agent budgets and global (organization-wide) budgets
- This belongs in agent-sre, not in the policy evaluator or a new module
- Enforcement should be post-action (observe cost, alert if trending over), not pre-action prediction
- Start with Python, track porting to other SDK languages
Decision¶
Implement cost governance as an extension of the existing agent-sre module, using a tiered budget model with post-action enforcement.
Cost Metadata Model (Layered)¶
Cost information flows from three sources, with later sources overriding earlier ones:
- Tool annotations (tool author provides): tool definitions include a
cost_hintfield with estimated cost per invocation - Policy cost mappings (governance author provides): policy YAML maps tool names to cost values, overriding tool-provided hints
- Runtime metering (actual observation): after execution, the actual cost is captured from billing APIs or SDK cost headers, correcting any estimates
# Example: cost policy in agent-sre
cost_governance:
budgets:
- scope: agent
agent_id: "analyst-*"
soft_cap: 5.00 # USD, fires alert
hard_cap: 25.00 # USD, blocks further actions
window: 1h # Rolling window
- scope: global
soft_cap: 500.00
hard_cap: 2000.00
window: 24h
cost_map:
gpt-4-turbo: 0.03 # per 1K tokens (estimate)
database_query: 0.001
send_email: 0.00
web_search: 0.005
Enforcement Model¶
Post-action enforcement: after each action completes, the cost observer records the cost and checks budget status.
- Under soft cap: no action, cost recorded in telemetry
- Soft cap exceeded: alert fires (webhook, OTel event), action is not blocked
- Hard cap exceeded: subsequent actions are blocked until the window rolls over or an administrator raises the cap
This is deliberately post-action rather than pre-action because: - Pre-action cost prediction is unreliable (LLM token counts vary, tool costs depend on parameters) - Post-action observation uses actual costs, providing accurate data - The tiered model ensures the first action that exceeds a soft cap triggers an alert before reaching the hard cap
Integration Point¶
The cost observer hooks into agent-sre's existing SLO/error budget pipeline:
Action executed -> Cost observer records cost -> Budget checker evaluates
|
Under soft cap: log only
Over soft cap: alert
Over hard cap: block next action
Data Model¶
@dataclass
class CostRecord:
agent_id: str
action: str
estimated_cost: float # From tool hint or policy mapping
actual_cost: float # From runtime metering (if available)
source: str # "tool_hint", "policy_map", "runtime_meter"
timestamp: datetime
request_id: str
@dataclass
class BudgetStatus:
scope: str # "agent" or "global"
identifier: str # agent_id pattern or "global"
window_start: datetime
total_spent: float
soft_cap: float
hard_cap: float
status: str # "ok", "soft_exceeded", "hard_exceeded"
Consequences¶
Benefits¶
- Organizations gain visibility and control over agent spending
- Tiered model provides early warning before hard limits are hit
- Post-action approach gives accurate cost data rather than estimates
- Fits naturally into agent-sre's existing observability pipeline
- No changes to the hot path (policy evaluator) or core kernel
Tradeoffs¶
- Post-action means the action that crosses the hard cap still executes (one action overshoot)
- Runtime metering requires integration with billing APIs (provider-specific)
- Cost estimation accuracy depends on governance authors maintaining cost maps
Follow-up work¶
- Implement CostObserver in agent-sre (Python first)
- Add cost_hint field to tool registration schema
- Add cost governance section to SRE tutorial
- Track porting to .NET, Rust, Go, TypeScript SDKs
- Consider pre-action budget check as optional "strict mode" in future iteration