Tutorial 51: Cost Governance and Budget Enforcement¶

Package: agent-sre · Time: 15 minutes · Level: Intermediate

What You'll Learn¶

Setting per-agent and organization-wide spending limits
Tiered alerts (warn at 80%, throttle at 90%, kill at 100%)
Auto-throttling agents approaching budget limits
Kill-switch integration for runaway cost scenarios

Prerequisites: Install AGT with the SRE package:

pip install agent-sre

Why Cost Governance?¶

Without cost governance, a single runaway agent can consume your entire monthly LLM budget in hours. Multi-agent orchestration makes this worse: cascading calls across agents multiply costs unpredictably.

AGT's CostGuard provides:

Capability	What It Does
Per-agent budgets	Daily spending limits per agent
Per-task limits	Maximum cost per single task
Organization budget	Global monthly cap across all agents
Tiered alerts	Warnings at 50%, 75%, 90%, 95% utilization
Auto-throttle	Slow down agents at 85% budget
Kill switch	Stop agents at 95% budget
Cost anomaly detection	Flag unusual spending patterns

Core Concepts¶

Post-Action Enforcement¶

CostGuard uses post-action enforcement: after each action completes, it records the actual cost and checks budget status. This is more accurate than pre-action prediction because:

LLM token counts vary per request
Tool costs depend on parameters
Actual costs come from billing APIs, not estimates

Action executed -> CostGuard.record_cost() -> Budget check
                                                  |
                                          Under soft cap: log only
                                          Over soft cap: alert
                                          Over hard cap: kill agent

Tiered Budget Model¶

0%        50%       75%       85%      90%      95%     100%
|----------|---------|---------|--------|--------|--------|
   OK       WARN     WARN    THROTTLE  CRIT    KILL    BLOCKED

Step 1: Basic Budget Setup¶

from agent_sre.cost import CostGuard

# Create a cost guard with budget limits
guard = CostGuard(
    per_task_limit=2.00,          # Max $2 per task
    per_agent_daily_limit=50.00,  # Max $50/day per agent
    org_monthly_budget=1000.00,   # Max $1000/month total
    auto_throttle=True,           # Auto-throttle at 85%, kill at 95%
)

print(f"Per-task limit:    ${guard.per_task_limit:.2f}")
print(f"Daily agent limit: ${guard.per_agent_daily_limit:.2f}")
print(f"Org monthly:       ${guard.org_monthly_budget:.2f}")

Step 2: Pre-Task Budget Check¶

Before running an expensive task, check if the budget allows it:

# Check if a task can proceed
allowed, reason = guard.check_task("analyst-agent", estimated_cost=1.50)
print(f"Task allowed: {allowed} ({reason})")
# Output: Task allowed: True (ok)

# Check a task that exceeds per-task limit
allowed, reason = guard.check_task("analyst-agent", estimated_cost=5.00)
print(f"Expensive task: {allowed} ({reason})")
# Output: Expensive task: False (Estimated cost $5.00 exceeds per-task limit $2.00)

Step 3: Record Costs and Get Alerts¶

After each task, record the actual cost:

# Record a normal task
alerts = guard.record_cost("analyst-agent", "task-001", cost_usd=0.50)
print(f"Task 001: $0.50, alerts: {len(alerts)}")

# Record with cost breakdown
alerts = guard.record_cost(
    "analyst-agent",
    "task-002",
    cost_usd=1.20,
    breakdown={"gpt-4": 1.00, "web-search": 0.20},
)
print(f"Task 002: $1.20, alerts: {len(alerts)}")

# Check budget status
budget = guard.get_budget("analyst-agent")
print(f"Spent today:  ${budget.spent_today_usd:.2f}")
print(f"Remaining:    ${budget.remaining_today_usd:.2f}")
print(f"Utilization:  {budget.utilization_percent:.1f}%")

Step 4: Watch Alerts Escalate¶

As spending increases, alerts escalate through the tiers:

# Simulate spending that triggers alerts
for i in range(20):
    task_id = f"task-{i + 10:03d}"
    alerts = guard.record_cost("analyst-agent", task_id, cost_usd=2.00)
    if alerts:
        for alert in alerts:
            print(f"  [{alert.severity.value.upper()}] {alert.message}")
            if alert.action.value != "alert":
                print(f"    Action: {alert.action.value}")

# Check final budget status
budget = guard.get_budget("analyst-agent")
print(f"\nFinal status:")
print(f"  Spent:     ${budget.spent_today_usd:.2f}")
print(f"  Throttled: {budget.throttled}")
print(f"  Killed:    {budget.killed}")

Once an agent is killed, all subsequent tasks are blocked:

allowed, reason = guard.check_task("analyst-agent", estimated_cost=0.01)
print(f"After kill: allowed={allowed}, reason={reason}")
# Output: After kill: allowed=False, reason=Agent killed — budget exhausted

Step 5: Organization-Wide Budget¶

The global budget tracks spending across ALL agents:

guard = CostGuard(
    per_agent_daily_limit=100.00,
    org_monthly_budget=200.00,  # Low for demo
    kill_switch_threshold=0.95,
)

# Multiple agents spending
for agent in ["agent-a", "agent-b", "agent-c"]:
    alerts = guard.record_cost(agent, "task-1", cost_usd=60.00)
    if alerts:
        for alert in alerts:
            print(f"  [{alert.severity.value.upper()}] {alert.message}")

# Once org budget is killed, ALL agents are blocked
for agent in ["agent-a", "agent-b", "agent-c"]:
    allowed, reason = guard.check_task(agent, estimated_cost=0.01)
    print(f"  {agent}: allowed={allowed}")

Step 6: Cost Anomaly Detection¶

CostGuard includes built-in anomaly detection for unusual spending:

from agent_sre.cost import CostAnomalyDetector

detector = CostAnomalyDetector()

# Feed normal cost history
for i in range(20):
    detector.ingest(1.0 + (i % 3) * 0.2, agent_id="data-agent")

# Check an anomalous cost - returns AnomalyResult if anomaly detected
result = detector.ingest(50.0, agent_id="data-agent")
if result:
    print(f"Anomaly detected!")
    print(f"Severity: {result.severity.value}")

Step 7: Cost Optimization Suggestions¶

The cost optimizer suggests cheaper alternatives:

from agent_sre.cost import CostOptimizer, ModelConfig, TaskProfile

optimizer = CostOptimizer()

# Register model options
optimizer.add_model(ModelConfig(
    name="gpt-4",
    cost_per_1k_tokens=0.03,
    quality_score=0.95,
))
optimizer.add_model(ModelConfig(
    name="gpt-3.5-turbo",
    cost_per_1k_tokens=0.002,
    quality_score=0.80,
))

# Get optimization for a task
result = optimizer.optimize(TaskProfile(
    task_type="summarization",
    estimated_tokens=2000,
    quality_requirement=0.75,
))
print(f"Recommended: {result.recommended_model}")
print(f"Estimated cost: ${result.estimated_cost:.4f}")
print(f"Savings vs default: {result.savings_percent:.0f}%")

Budget Configuration Reference¶

CostGuard(
    per_task_limit=2.0,          # Max cost per single task
    per_agent_daily_limit=100.0, # Max daily spend per agent
    org_monthly_budget=5000.0,   # Global monthly cap
    anomaly_detection=True,      # Enable anomaly detection
    auto_throttle=True,          # Auto-throttle and kill
    kill_switch_threshold=0.95,  # Kill at 95% utilization
    alert_thresholds=[           # Alert at these percentages
        0.50, 0.75, 0.90, 0.95
    ],
)

API Reference¶

CostGuard¶

Method	Description
`check_task(agent_id, estimated_cost)`	Pre-check if task is within budget
`record_cost(agent_id, task_id, cost_usd)`	Record actual cost, returns alerts
`get_budget(agent_id)`	Get agent's budget status
`get_all_budgets()`	Get all agents' budgets
`get_alerts()`	Get all triggered alerts
`get_summary()`	Get org-wide cost summary

AgentBudget¶

Field	Type	Description
`spent_today_usd`	`float`	Total spent today
`remaining_today_usd`	`float`	Budget remaining
`utilization_percent`	`float`	0-100% of daily limit
`throttled`	`bool`	Auto-throttled at 85%
`killed`	`bool`	Killed at 95%

What's Next¶

Tutorial 05 - Agent Reliability (SRE): Cost governance alongside SLOs and error budgets
Tutorial 49 - Multi-Agent Policies: Combine cost limits with collective rate limiting
Tutorial 13 - Observability & Tracing: Correlate cost events with OTel traces