Skip to content

Chapter 5: Approval Workflows

In Chapters 1โ€“4, every policy decision was instant: allow or deny. The system evaluated the rules, picked a winner, and the agent moved on. But some actions are too important for any automatic answer. Deleting a customer's account, transferring money, or sending a mass email โ€” these are cases where you want a human to review and approve before the agent proceeds.

What you'll learn:

Section Topic
The problem Why allow/deny is not enough for high-stakes actions
The three-outcome model Adding ESCALATE between ALLOW and DENY
Create an escalation request Building a "ticket" for human review
Human approves What happens when the reviewer says yes
Human denies What happens when the reviewer says no
Timeout What happens when nobody responds
Try it yourself Exercises

The problem

Think of a bank teller. A customer walks up and asks to deposit a check โ€” the teller processes it immediately. But when a customer asks to wire $50,000 overseas, the teller does not press the button. They call a manager. The manager reviews the request, checks the account, and either approves or rejects it.

The agent is the teller. The policy system is the rule book that says which transactions need a manager. And the EscalationHandler is the manager.

In Chapters 1โ€“4, every tool call was like a deposit โ€” the system gave an instant answer. This chapter adds the wire transfer: actions where the system says "I cannot decide this alone" and routes the request to a human.

The system goes from two possible outcomes (allow, deny) to three (allow, deny, escalate). This is the first time a policy decision is not immediate โ€” the agent must pause until a human responds or a timeout triggers.


Step 1: The three-outcome model

The policy (05_approval_policy.yaml)

version: "1.0"
name: approval-workflow-policy
description: >
  Policy with three decision tiers: allow, deny, and escalate.
  Tools whose message says "requires human approval" are routed
  to a human reviewer before the agent may proceed.

rules:
  # Tier 1: Always allowed โ€” safe, read-only actions
  - name: allow-search-documents
    condition:
      field: tool_name
      operator: eq
      value: search_documents
    action: allow
    priority: 80
    message: "Safe action: searching documents is always allowed"

  # Tier 2: Always denied โ€” irreversibly destructive
  - name: block-delete-database
    condition:
      field: tool_name
      operator: eq
      value: delete_database
    action: deny
    priority: 100
    message: "Destructive action: deleting databases is never allowed"

  # Tier 3: Escalate โ€” needs human review
  - name: escalate-transfer-funds
    condition:
      field: tool_name
      operator: eq
      value: transfer_funds
    action: deny
    priority: 90
    message: "Sensitive action: transfer_funds requires human approval"

  - name: escalate-send-email
    condition:
      field: tool_name
      operator: eq
      value: send_email
    action: deny
    priority: 85
    message: "Sensitive action: send_email requires human approval"

defaults:
  action: allow
  max_tool_calls: 10

Three tiers:

  • Tier 1 (allow): search_documents is read-only โ€” always safe.
  • Tier 2 (deny): delete_database is irreversibly destructive โ€” always blocked, no point asking a human.
  • Tier 3 (escalate): transfer_funds and send_email are neither always safe nor always dangerous. Their message says "requires human approval" โ€” the Python code sees that phrase and routes the tool call to a human reviewer instead of blocking it outright.

Evaluating the policy

from agent_os.policies import PolicyEvaluator
from agent_os.policies.schema import PolicyDocument

policy = PolicyDocument.from_yaml("05_approval_policy.yaml")
evaluator = PolicyEvaluator(policies=[policy])

ESCALATION_KEYWORD = "requires human approval"

for tool in ["search_documents", "delete_database", "transfer_funds"]:
    decision = evaluator.evaluate({"tool_name": tool})
    if decision.allowed:
        print(f"{tool}: allow")
    elif ESCALATION_KEYWORD in decision.reason.lower():
        print(f"{tool}: escalate โ€” needs human approval")
    else:
        print(f"{tool}: deny")

Example output

  Tool                   Decision       Reason
  -------------------------------------------------------
  search_documents       โœ… allow     Safe action: searching documents is always allowed
  write_file             โœ… allow     No rules matched; default action applied
  delete_database        ๐Ÿšซ deny      Destructive action: deleting databases is never allowed
  transfer_funds         โณ escalate  Sensitive action: transfer_funds requires human approval
  send_email             โณ escalate  Sensitive action: send_email requires human approval

Step 2: Create an escalation request

When the policy says "requires human approval", the agent cannot proceed on its own. It creates an escalation request โ€” a ticket that contains the agent's ID, the action it wants to take, and why it needs approval.

from agent_os.integrations.escalation import (
    EscalationHandler,
    InMemoryApprovalQueue,
)

queue = InMemoryApprovalQueue()
handler = EscalationHandler(backend=queue, timeout_seconds=300)

request = handler.escalate(
    agent_id="finance-agent",
    action="transfer_funds",
    reason="Agent wants to transfer $5,000 to vendor account",
)

print(request.request_id)  # a1b2c3d4-...  (unique ID)
print(request.decision)    # EscalationDecision.PENDING
print(len(queue.list_pending()))  # 1

What just happened?

  1. handler.escalate() created an EscalationRequest with a unique ID.
  2. It submitted the request to the InMemoryApprovalQueue.
  3. The request's status is PENDING โ€” the agent is now suspended.
  4. The queue has one pending item. Someone needs to review it.

Example output

  Request ID:  a1b2c3d4...
  Agent:       finance-agent
  Action:      transfer_funds
  Reason:      Agent wants to transfer $5,000 to vendor account
  Status:      โณ PENDING

  Pending requests in queue: 1

The agent is waiting. What happens next depends on the human reviewer.


Step 3: Human approves

A manager reviews the pending request and decides the transfer is legitimate:

queue.approve(request.request_id, approver="manager@corp.com")
final = handler.resolve(request.request_id)

print(final)  # EscalationDecision.ALLOW

queue.approve() marks the request as approved and records who approved it. handler.resolve() checks the queue and returns the final decision.

In production, the agent and the reviewer are on different sides. The agent calls handler.resolve(), which blocks and waits. A human uses a separate interface โ€” a dashboard, a Slack bot, or an API endpoint โ€” to call queue.approve() or queue.deny(). Here we simulate both sides in one script for simplicity.

Example output

  manager@corp.com approved request a1b2c3d4...
  Final decision: โœ… ALLOW

  The manager approved the transfer. The agent can proceed.

The audit trail now records: the finance-agent requested transfer_funds, it was escalated, and manager@corp.com approved it. Every step is traceable.


Step 4: Human denies

Not every escalation gets approved. The compliance team reviews a request to send a promotional email to 10,000 customers and says no:

request2 = handler.escalate(
    agent_id="support-agent",
    action="send_email",
    reason="Agent wants to email 10,000 customers a promotional offer",
)

queue.deny(request2.request_id, approver="compliance@corp.com")
final2 = handler.resolve(request2.request_id)

print(final2)  # EscalationDecision.DENY

Example output

  compliance@corp.com denied request e5f6g7h8...
  Final decision: ๐Ÿšซ DENY

  Compliance reviewed the request and said no.
  The agent must not send the email.

This is different from the automatic deny in Step 1. delete_database is blocked by the policy โ€” no human is involved. But send_email was reviewed by a person, and the audit trail records who made the call and when.


Step 5: Timeout โ€” nobody responds

What if the human reviewer is on vacation? Or the request lands in a queue nobody checks? The agent cannot wait forever.

EscalationHandler has a timeout with a configurable default action. When the timeout expires with no response, the system picks the default:

from agent_os.integrations.escalation import DefaultTimeoutAction

timeout_handler = EscalationHandler(
    backend=InMemoryApprovalQueue(),
    timeout_seconds=300,            # 5 minutes in production
    default_action=DefaultTimeoutAction.DENY,  # safe default
)

If nobody approves or denies within 300 seconds, the system defaults to DENY. The agent does not proceed.

Why DENY is the safe default

An unanswered escalation should not silently become an approval. If a human is supposed to review an action and nobody does, the safest assumption is that the action should not happen. You can set DefaultTimeoutAction.ALLOW for less critical actions, but that is rare โ€” if the action were not important, it would not need escalation in the first place.

Example output

  Request i9j0k1l2... is waiting for approval...
  Timeout after 0s with no response.
  Final decision: ๐Ÿšซ DENY (safe default)

  When nobody responds, the system fails safe.
  DefaultTimeoutAction.DENY ensures unanswered
  requests do not become silent approvals.

Full example

python docs/tutorials/policy-as-code/examples/05_approval_workflows.py
============================================================
  Chapter 5: Approval Workflows
============================================================

--- Part 1: The three-outcome model ---

  Tool                   Decision       Reason
  -------------------------------------------------------
  search_documents       โœ… allow     Safe action: searching documents is always allowed
  write_file             โœ… allow     No rules matched; default action applied
  delete_database        ๐Ÿšซ deny      Destructive action: deleting databases is never allowed
  transfer_funds         โณ escalate  Sensitive action: transfer_funds requires human approval
  send_email             โณ escalate  Sensitive action: send_email requires human approval

  For the first time, the system has three outcomes.
  transfer_funds and send_email are not automatically
  allowed or denied โ€” they are paused until a human reviews.

--- Part 2: Creating an escalation request ---

  Request ID:  a1b2c3d4...
  Agent:       finance-agent
  Action:      transfer_funds
  Reason:      Agent wants to transfer $5,000 to vendor account
  Status:      โณ PENDING

  Pending requests in queue: 1

  The agent is now suspended. A ticket exists in the
  approval queue. Someone needs to review it.

--- Part 3: Human approves ---

  manager@corp.com approved request a1b2c3d4...
  Final decision: โœ… ALLOW

  The manager approved the transfer. The agent can proceed.

--- Part 4: Human denies ---

  compliance@corp.com denied request e5f6g7h8...
  Final decision: ๐Ÿšซ DENY

  Compliance reviewed the request and said no.
  The agent must not send the email.

--- Part 5: Timeout โ€” nobody responds ---

  Request i9j0k1l2... is waiting for approval...
  Timeout after 0s with no response.
  Final decision: ๐Ÿšซ DENY (safe default)

  When nobody responds, the system fails safe.
  DefaultTimeoutAction.DENY ensures unanswered
  requests do not become silent approvals.

  In production, timeouts are typically 300 seconds
  (5 minutes) or more โ€” enough time for a human to
  review, but not so long that the agent waits forever.

============================================================
  The policy system now has three tiers: allow, deny,
  and escalate. Sensitive actions wait for a human
  before the agent proceeds โ€” or time out safely.
============================================================

Note: Request IDs are randomly generated UUIDs โ€” your output will show different IDs than the examples above.


How does it work?

Here is the lifecycle of an escalation, from the agent's tool call to the final decision:

  Agent wants to call transfer_funds
        โ”‚
        โ–ผ
  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
  โ”‚  Policy evaluator says:      โ”‚
  โ”‚  "requires human approval"   โ”‚
  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
             โ”‚
             โ–ผ
  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
  โ”‚  EscalationHandler           โ”‚
  โ”‚  creates request (PENDING)   โ”‚
  โ”‚  submits to approval queue   โ”‚
  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
             โ”‚
      โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
      โ–ผ      โ–ผ          โ–ผ
   Human   Human     Nobody
   approves denies   responds
      โ”‚      โ”‚          โ”‚
      โ–ผ      โ–ผ          โ–ผ
   โœ… ALLOW ๐Ÿšซ DENY  โณ TIMEOUT
                        โ”‚
                        โ–ผ
                   Safe default
                   (usually DENY)

The key classes:

Class Role
InMemoryApprovalQueue Stores pending requests; humans call approve() or deny() on it
EscalationHandler Creates requests, manages the timeout, resolves the final decision
EscalationRequest The "ticket" โ€” carries the agent ID, action, reason, and current status
EscalationDecision Enum with five states: ALLOW, DENY, ESCALATE, PENDING, TIMEOUT
DefaultTimeoutAction What to do when the timeout expires: DENY (safe) or ALLOW (rare)

In production, the approval queue would not be in-memory โ€” it would be backed by a database, a message broker, or a webhook that pings a Slack channel or email inbox. The library also provides WebhookApprovalBackend for HTTP-based notification workflows. For critical actions that need multiple sign-offs, the library supports quorum-based approval via QuorumConfig, where M of N reviewers must agree before an action proceeds.


Try it yourself

  1. Change the safe default. In Part 5, swap DefaultTimeoutAction.DENY for DefaultTimeoutAction.ALLOW and re-run. What changes? When would this be appropriate?

  2. Add a new escalation rule. Add delete_account to the YAML policy as an escalation-worthy tool (Tier 3). Update the Python script to include it in the tools list and verify it shows โณ escalate.


What's missing?

We now have policies that allow, deny, and escalate. But how do you know they work correctly together? What if someone edits the YAML and accidentally removes the "requires human approval" message from the transfer_funds rule? That tool would silently flip from "escalate" to "deny" โ€” and nobody would notice until a real transfer fails.

Manual checking does not scale. You need automated tests that verify every tool gets the right decision in every environment. That is policy testing.

Previous: Chapter 4 โ€” Conditional Policies Next: Chapter 6 โ€” Policy Testing โ€” verify that every policy rule works correctly, automatically.