Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Request-Based Communication in Multi-Agent Systems

Last updated: 2025-06-04

In a request-based model, all interactions—between clients, orchestrators, and expert agents—are initiated by explicit requests. A central orchestrator receives the client's request, plans the workflow, and delegates tasks to specialized agents. Each agent processes its task independently and returns results to the orchestrator, which compiles and returns the final output.


Non-Streaming vs Streaming Interactions

Most client-to-agent communication falls into two categories:

  • Non-Streaming (Single-Response): The client sends a request to the orchestrator that initiates the multi-agent workflow, and waits for a single, complete response—well suited for short-lived or atomic tasks where only the final result matters.
    • Advantages: Simplicity, straightforward error handling and easier observability.
    • Limitations: Not ideal for long-running or incremental workloads; client is blocked until task completes.
  • Streaming (Incremental Response): Results or updates are sent incrementally during processing (e.g., via server-sent events, gRPC streaming, or websockets). This enables real-time progress feedback or delivery of large outputs as they are generated, common in LLM-driven use cases
    • Advantages: Immediate progress visibility, lower perceived latency for users, supports scalable delivery to multiple consumers.
    • Limitations: More complex protocol handling, error management, and observability; potential network compatibility issues.

Streaming recommendation

In practice, for multi-agent architectures, it's most effective to only stream responses from the orchestrator agent to the end client—not between the orchestrator and internal expert agents due the following reasons:

  • Workflow Control: The orchestrator is responsible for the workflow's consistency and correctness. If expert agents stream content directly, the orchestrator cannot guarantee ordering, error handling, or workflow structure.
  • Error Handling: Streaming across multiple hops makes error recovery more complex—if an expert agent fails or returns incomplete data, the orchestrator must reconcile fragmented streams, resulting in fragile and error-prone logic.
  • Increased Complexity: Maintaining streaming protocols, session state, and message ordering across all agent-to-agent boundaries increases architectural complexity and operational risk.
  • Observability & Traceability: When multiple agents stream data independently, it becomes challenging to trace events, maintain version control, and audit workflows in a predictable sequence.
  • Value vs. Effort: Direct streaming between agents rarely delivers end-user value that justifies the development and maintenance overhead. The orchestrator is the only agent with enough context to assemble, filter, and sequence partial results before presenting them to users.

Communication Patterns

1. Synchronous Request-Reply

Non-streaming

sequenceDiagram
    participant Client
    participant Orchestrator Agent
    participant Specialized Agent

    Client->>Orchestrator Agent: Request
    activate Orchestrator Agent
    Note over Orchestrator Agent: Planning
    Orchestrator Agent->>Specialized Agent: Request
    activate Specialized Agent
    Specialized Agent-->>Orchestrator Agent: Response
    deactivate Specialized Agent
    Orchestrator Agent-->>Client: Response
    deactivate Orchestrator Agent
Key characteristics
  • Direct Feedback: Results or errors are delivered immediately in the same session.
  • Straightforward observability: Tracing and debugging form linear, easy-to-follow flows.
  • Blocking operation: The client waits for a response, pausing its workflow until the agent response completes.
Tradeoffs
  • Temporal coupling: Both client and agents must be available and responsive at the same time.
  • Scalability: Each request consumes resources until completion, constraining throughput under high concurrency.
  • Latency Sensitivity: Downstream slowness or outages have immediate upstream impact.
  • Availability Risk: Agent outages or delays directly affect the client.
  • Challenges with long-running tasks: Increased risk of timing out and overloading system resources.

Server-Streaming

sequenceDiagram
    participant Client
    participant Orchestrator Agent
    participant Specialized Agent(s)
    Client->>Orchestrator Agent: Initiate streaming request
    Note over Orchestrator Agent: Planning
    Orchestrator Agent->>Specialized Agent(s): Request
    activate Specialized Agent(s)
    Specialized Agent(s)-->>Orchestrator Agent: Response
    deactivate Specialized Agent(s)
    loop Stream
        Orchestrator Agent-->>Client: Partial response / update
    end
    Orchestrator Agent-->>Client: Completion signal

Common streaming connection patterns:

  • Server-Sent Events (SSE): The orchestrator agent streams incremental updates to the client over a single HTTP connection using the text/event-stream format—well-suited for browsers and lightweight consumers.

  • HTTP Chunked Responses: The server delivers result fragments as they become available via HTTP chunked transfer encoding, keeping the connection open for the duration of the task.

Key Characteristics
  • Non-blocking workflow: The client submits a request to initiate a potentially long-running process and receives prompt acknowledgment (containing a unique task ID). The workflow executes independently, freeing the client from waiting for completion.

    While the overall pattern is asynchronous, the initial request is typically a synchronous HTTP exchange to obtain the reference/task ID.

  • Deferred response via polling: The client periodically polls a status endpoint (using the task ID) to check for task progress or retrieval of the final result.

  • Temporal decoupling: The client and orchestrator do not need to maintain a persistent connection or block on each other—the client can reconnect and resume polling as needed.

Tradeoffs
  • Client/Transport Support: Not all clients (e.g., legacy proxies, web clients) handle open or long-lived connections or all protocols (gRPC, SSE, etc.).
  • Error Handling Complexity: Client must handle mid-stream errors, partial or interrupted streams, and implement reconnection logic to continue receiving updates without data loss or duplication.
  • Potential Networking Restrictions: Firewalls, proxies, or load balancers may block or prematurely terminate long-lived HTTP connections; mobile or low-bandwidth networks can disrupt streams; TLS termination proxies might interfere with HTTPS streaming.

2. Asynchronous Request-Reply

Non-streaming

sequenceDiagram
    participant Client
    participant Orchestrator Agent

    Client->>Orchestrator Agent: Submit Request
    Orchestrator Agent-->>Client: Ack with Task ID

    Note over Orchestrator Agent: Workflow executes long-running task(s)

    loop Polling loop
        Client->>Orchestrator Agent: GET /status/{task_id}
        Orchestrator Agent-->>Client: Task pending/in progress
    end
    Client->>Orchestrator Agent: GET /status/{task_id}
    Orchestrator Agent-->>Client: Task completed + Result
Key Characteristics
  • Non-blocking Interaction: The client sends a request to trigger a long-running process and receives an immediate acknowledgment, usually containing a reference ID. The actual result is delivered later, allowing the client to continue without waiting.

    While the overall pattern is asynchronous, the initial request often involves a brief synchronous exchange to initiate processing and retrieve a task reference.

  • Deferred Responses: Results are delivered through asynchronous polling endpoint channel.

  • Temporal Decoupling: The client and orchestrator do not need to block while waiting for each other. However, polling requires the client to remain available or reconnect periodically.

Tradeoffs
  • Increased Complexity: Requires mechanisms for reliable delivery, idempotency (the ability of a given operation to always produce the same result), correlation IDs, and robust failure handling on both client and coordinator sides.

  • Management Overhead: Polling introduces additional load on the orchestrator and may result in inefficient resource usage if polling intervals aren't well-tuned.

  • Eventual Consistency: State about long-running tasks may briefly diverge between client expectations and orchestrator status, especially under failures or network partitions.

  • Observability Challenges: To accurately view workflow progress, distributed tracing and proper correlation of request/response cycles (e.g., using trace or context IDs) is essential.

  • Duplicate/Out-of-Order Processing: Clients may issue redundant requests or receive delayed/out-of-order responses, requiring deduplication and idempotency strategies.

  • Delayed Feedback: Clients must rely on repeated polling for updates rather than receiving instantaneous notifications—potentially increasing perceived latency.

Message-driven alternatives (such as webhooks, message queues and brokers) will be covered in Message-driven Communication.

Server-Streaming

Similar to non-streaming approach, the client sends a request to trigger a long-running process and receives an immediate acknowledgment, usually containing a reference ID. The client then opens a special streaming connection referencing the Task ID to receive updates as the task progresses.

Streamed messages can include a TaskID or similar identifier to help the client associate updates with the correct ongoing request.

sequenceDiagram
    participant Client
    participant Orchestrator
    participant EventStream
    Client->>Orchestrator: Submit request (async)
    Orchestrator-->>Client: Ack (task ID)
    Note over Orchestrator: Asynchronous task + streaming output
    loop Stream updates
        Orchestrator-->>EventStream: Task ID, partial result
        Client-->>EventStream: Receives update
    end
    Orchestrator-->>EventStream: Task ID, final result
    Client-->>EventStream: Completion event
Key Characteristics
  • Truly Decoupled Streaming: Neither client nor agent is blocked; updates are pushed as they become available using HTTP-based streaming mechanisms such as Server-Sent Events (SSE) or chunked transfer encoding.
  • High Scalability: Well-suited for distributed, high-throughput systems—whether using lightweight server-push methods like SSE or message brokers.
Tradeoffs
  • Higher Complexity: Requires robust handling for message ordering, deduplication, replay, and idempotency across distributed systems.
  • Potentially “Lost” Events: Ensuring reliable delivery (e.g., at-least-once, at-most-once, exactly-once) demands careful design of brokers and consumers, especially with transient connections like SSE.
  • Stream-to-Task Correlation: All result messages must include correlation metadata such as TaskID, RequestID, or similar to allow downstream processing to track and associate updates correctly.

Summary Table

AspectSynchronous (Non-Streaming)Synchronous StreamingAsynchronous (Non-Streaming)Asynchronous Streaming
ResponsivenessImmediate, single responseImmediate start, incremental resultsDeferred responseDeferred start, incremental/partial results
ScalabilityLow (client blocking)Low–Medium [1]High[2]Very High
ComplexityLowMedium (connection mgmt, reconnections)High (queues, retries, correlation)Very High (ordering, replay, deduplication, idempotency)
Best FitShort tasks, APIs, low latencyProgress bars, live updates, partialsBatch jobs, background tasks, eventsLong-running tasks with feedback, event workflows
Failure HandlingSimpleMedium (stream errors, reconnects)Advanced (timeouts, retries, idempotency)Complex (stream replay, correlation IDs, lost messages)

[1] Partial results are flushed and transferred as they are produced. This means the server can free memory sooner, and the client can process each chunk without needing to buffer everything until the end.

[2] Enables time decoupling between the client and orchestrator. Leverages background jobs, improving scalability, fault tolerance, and supporting offline processing.


Recommendations

  • Start Simple: Start with the simplest communication pattern that effectively serves your use case—typically synchronous or asynchronous request-reply without streaming. These approaches are easier to implement, debug, and observe, making them ideal during early development or when validating your core business logic.

  • Adopt Streaming Incrementally: Introduce streaming only when clearly justified—such as when delivering incremental results, providing real-time feedback, or handling large outputs that would otherwise block or degrade performance. Avoid starting with streaming unless it solves a proven bottleneck.

  • Use Standards-Based Protocols: Favor open and emerging protocols to future-proof your architecture and benefit from built-in security, governance, and interoperability.

    Examples include Agent-to-Agent Protocol (A2A), Agent Network Protocol (ANP), and Agent Communication Protocol (ACP), which are particularly suited for structured request-based communication between agents.

  • Prioritize observability for async and streaming: Asynchronous and streaming systems introduce additional operational complexity. Invest early in tracing, correlation IDs, error handling, and monitoring tools to maintain visibility and quickly resolve issues.

  • Iterate based on real feedback: Let actual production signals and user experience guide your evolution from simple to more complex communication patterns. This iterative, feedback-driven approach helps avoid premature optimization and aligns well with agile development principles.

References


Discuss this page