How Agents Communicate

Messages, shared state, and handoffs

Advanced 13 minBuilder

What you'll be able to do

Distinguish the three core communication mechanics — direct message passing, shared state/blackboard, and explicit handoffs — and when each fits
Implement a handoff in a real framework (LangGraph Command, OpenAI Agents SDK handoff tool) with the correct message history
Choose a context-passing strategy (full history, structured object, or summary) and reason about its cost and reliability trade-offs
Pass enough context for the receiving agent to succeed without bloating the context window
Diagnose the dominant real-world failure modes from the MAST taxonomy — inter-agent misalignment, context loss, and duplicated work

At a glance

When you split a job across multiple agents, the hard part is no longer the reasoning inside any one agent — it is what passes between them. This lesson covers the three mechanics of inter-agent communication (message passing, shared state, and handoffs), the critical choice of how much context to transfer at each boundary, and why most real multi-agent failures are miscommunication, not model weakness.

1The bottleneck is the wire, not the agent
2Message passing: agents talk to agents
3Shared state and the blackboard
4Handoffs: transferring control and context
5Passing enough context without bloat
6How miscommunication breaks systems

The bottleneck is the wire, not the agent

In a single-agent system, everything the model needs lives in one context window. The moment you split work across agents, you introduce a boundary — and every boundary is a place where information must be packaged, transmitted, and unpacked. The quality of your multi-agent system is decided far more by what crosses those boundaries than by how clever any individual agent is.

This is counterintuitive. Teams reach for multi-agent architectures expecting more capability, then discover their bottleneck is coordination. The 2025 MAST study (UC Berkeley et al., 1,600+ annotated traces across 7 frameworks) found that the dominant causes of failure were inter-agent misalignment and specification issues — not raw model capability. Agents reset conversations, propagated wrong assumptions, withheld information, and duplicated each other's work.

Three mechanics carry information between agents: message passing (an agent sends a structured message to another), shared state (agents read and write a common workspace), and handoffs (an agent transfers control, along with context, to a specialist). Most production systems use all three. Master these and you control the variable that actually determines whether a multi-agent system works.

Key insight

The reframe

Adding agents does not automatically add capability — it adds communication boundaries. Each boundary is a potential point of context loss. Design the boundaries first; the agents are the easy part.

Message passing: agents talk to agents

The most common mechanic is message passing: one agent sends another a structured message — natural language plus typed metadata — either directly (as a function/tool call) or through an orchestrator that routes it. This mirrors human delegation: a manager hands a worker a task with the relevant details attached.

The message is rarely just raw text. It carries metadata the receiver needs to act: the task, constraints, priority, and a place to put the result. A clean message contract is what keeps a system debuggable.

python

from pydantic import BaseModel
from typing import Literal

class AgentMessage(BaseModel):
    sender: str
    recipient: str
    task: str                       # what to do
    context: dict                   # only the relevant fields
    priority: Literal["low", "normal", "high"] = "normal"
    reply_to: str | None = None     # where the result goes

msg = AgentMessage(
    sender="orchestrator",
    recipient="research_agent",
    task="Find the 2026 pricing for competitor X's enterprise tier",
    context={"company": "X", "region": "EU"},
    priority="high",
)

Frameworks differ in framing. AutoGen (and its 2025 successor, Microsoft Agent Framework) treats agents as conversational participants exchanging chat messages in topologies like round-robin or nested group chats. OpenAI's Agents SDK and LangGraph treat inter-agent calls as tool invocations. Both are message passing; the difference is whether the model sees a conversation or a function call.

Tip

Type your messages

Define a message schema (Pydantic, Zod, a TypedDict) and validate it at every boundary. A typed contract turns a whole class of silent miscommunication bugs into loud, catchable validation errors.

Shared state and the blackboard

Message passing connects agents point-to-point. Shared state flips the model: instead of agents addressing each other, they read and write a common workspace. The classic form is the blackboard — a communal space where each agent posts findings and reads what others posted, asynchronously, with no direct peer-to-peer connection. Think of a whiteboard in a war room: nobody routes messages, everyone watches the board and contributes when they can help.

This decouples agents. A new specialist can be added without rewiring who-talks-to-whom; it simply watches the board for work it can do. A 2025 paper (arXiv:2510.01285) showed a blackboard system beating master–slave coordination by 13–57% on data-science benchmarks — decentralized, capability-based task pickup outperformed central routing.

The two patterns are not mutually exclusive, and the best systems combine them: an orchestrator uses message passing to delegate tasks, while all agents share a persistent state object to track progress.

In LangGraph this shared workspace is a typed State object persisted across nodes:

python

from typing import Annotated, TypedDict
from operator import add

class ResearchState(TypedDict):
    query: str
    findings: Annotated[list[str], add]   # each agent appends
    citations: Annotated[list[str], add]
    status: str

Each node receives the state, returns a partial update, and the framework merges it — the blackboard, made concrete.

Example

Filesystem as shared state

Anthropic's multi-agent research system uses the filesystem as shared state for heavy data. Subagents store large outputs externally (files, git) and pass only a lightweight reference back to the coordinator. This keeps multi-kilobyte tool results out of the conversation history — shared state that never touches the context window.

Handoffs: transferring control and context

A handoff is a special move: one agent doesn't just send a message, it transfers control to a specialist, along with the context that specialist needs. A triage agent recognizes a refund request and hands the conversation to the Refund Agent. The defining question of a handoff is not which agent receives control — it is how much context travels with it. That choice (next section) is the single biggest design decision in this lesson.

Frameworks implement handoffs differently:

OpenAI Agents SDK (March 2025) exposes handoffs as tools the LLM can call. A handoff to the Refund Agent becomes a callable transfer_to_refund_agent tool. It provides input_filter to prune history, on_handoff callbacks to capture structured metadata (reason, language, priority), and a recommended prompt prefix to steer the model.
LangGraph uses either conditional edges (add_conditional_edges() for static routing) or a Command object that updates state and names the next agent in one move, enabling dynamic handoffs.

python

from langgraph.types import Command
from typing import Literal

def triage_agent(state) -> Command[Literal["refund_agent", "support_agent"]]:
    intent = classify(state["messages"][-1])
    target = "refund_agent" if intent == "refund" else "support_agent"
    return Command(
        goto=target,
        update={"handoff_reason": intent, "priority": "high"},
    )

The Command both routes control and records why the handoff happened — exactly the metadata the next agent needs.

Watch out

LangGraph: pass BOTH messages

When implementing a handoff in a LangGraph subgraph, you must pass both the AIMessage containing the tool call and a matching ToolMessage (same tool_call_id) to the receiving agent. Omit either and the receiver sees a malformed conversation history and behaves unpredictably. This is one of the most common implementation bugs.

Passing enough context without bloat

Think of a handoff like briefing a colleague who is taking over your task. You can dump every email you ever exchanged on them, hand them a tidy one-page summary, or fill out a short form with just the facts they need. The middle path is usually right — and the same is true for agents. Here is the decision that makes or breaks a multi-agent system: at each handoff, how much context do you transfer? There are three strategies, each with a sharp trade-off.

Strategy	Token cost	Reliability	Use when
Full history forwarding	High	High fidelity, but degrades with length	Short chains where every detail matters
Structured context object	Low	High — receiver gets exactly the typed fields it needs	Default for most production systems
LLM-generated summary	Very low (70–90% reduction)	Lossy; adds latency	Long histories where detail is dispensable

The naive default — forward the entire conversation — is deceptively expensive. A 50-message thread passed through four handoffs means the fifth agent processes roughly 200 messages, most irrelevant. Worse, long contexts dilute attention and reduce reliability. The framework-recommended approach is the structured context object: the orchestrator passes only the typed fields the receiver needs.

Context isolation is now a first-class principle. Anthropic's system spawns fresh subagents with clean contexts and maintains continuity through carefully written task descriptions — not full-history dumps. Each agent gets only what it needs to do its job.

python

# Structured object: pass relevant fields, not the whole transcript.
handoff_context = {
    "task": "Process refund for order #4471",
    "customer_tier": "enterprise",
    "order_total": 1299.00,
    "reason": "defective unit, verified",
}
# vs. forwarding 50 messages of small talk and unrelated tickets.

Watch out

"Full history just works" is a myth at scale

Anthropic had to invent external filesystem state plus lightweight references precisely because forwarding full context between agents caused context-window overflow and compounding errors. Default full-history forwarding is expensive and often counterproductive.

How miscommunication breaks systems

When a multi-agent system fails, the instinct is to blame the model — "it just isn't smart enough." That instinct is almost always wrong. The agents reason fine in isolation; they break at the seams between them, the same way a capable team fails when nobody writes things down or two people unknowingly do the same job. Most multi-agent failures are orchestration failures, not capability failures. The MAST taxonomy (arXiv:2503.13657) catalogued 14 failure modes across three clusters — specification issues, inter-agent misalignment, and task verification — from 1,600+ traces (κ=0.88). The communication-specific ones recur everywhere:

Wrong assumptions propagated — agent A states something uncertain as fact; agent B builds on it; the error compounds downstream.
Information withholding — an agent knows something relevant but never surfaces it across the boundary.
Conversation reset / context loss — a handoff drops the thread; the receiver re-derives or contradicts prior work.
Task derailment & duplicated work — vague task boundaries cause two agents to do the same thing, or drift off-goal. Anthropic's own system once spawned 50 subagents for a simple query.

A subtler 2026 finding (Silo-Bench) named the Communication–Reasoning Gap: agents can exchange information and form the right coordination topology, yet still fail because they don't correctly integrate the distributed state into the final answer. The wire worked; the synthesis didn't.

The fixes are mostly about discipline at the boundary: explicit, detailed task descriptions with output-format specs; typed message contracts that fail loud; passing structured context, not raw history; and clear task boundaries so agents don't overlap. Communication quality is an engineering practice, not a model property.

Tip

Specify the output, not just the task

Anthropic found that telling a subagent what format and shape of result to return — not just what to do — was essential to avoid duplicated and derailed work. A precise output contract is the cheapest reliability upgrade you can ship.

Try it: Build a triage handoff with a structured context object

Build a two-agent system in LangGraph or the OpenAI Agents SDK: a Triage Agent that classifies an incoming customer message and hands off to either a Refund Agent or a Support Agent.

Requirements:

Define a typed handoff context (Pydantic/TypedDict) with task, reason, priority, and only the order/customer fields the receiver needs — do not forward the full conversation transcript.
Implement the handoff: in LangGraph use a Command(goto=..., update=...); in the Agents SDK expose handoffs as tools and capture metadata in an on_handoff callback.
Run two scenarios (a refund request and a how-to question) and log, for each, the exact context object that crossed the boundary.

Then run the same flow a second way, forwarding the entire message history instead of the structured object. Compare token counts for both. Write three sentences: how many tokens you saved, and one failure you can imagine the full-history version causing at scale (e.g., context overflow after several handoffs, or the receiver acting on irrelevant prior tickets).

Key takeaways

1Three mechanics carry information between agents — message passing, shared state/blackboard, and handoffs — and production systems usually combine all three.
2The biggest design decision is the context-passing strategy at each handoff: full history (expensive), structured object (recommended default), or summary (lossy).
3Forwarding full conversation history does not scale — it inflates cost, dilutes attention, and reduces reliability; pass only the typed fields the receiver needs.
4The blackboard/shared-state pattern is modern, not legacy: it decouples agents and beat central coordination by 13–57% on 2025 data-science benchmarks.
5Most multi-agent failures are miscommunication — wrong assumptions, withheld info, context loss, duplicated work — not model capability gaps.

Quiz

Lock in what you learned

Check your understanding

0 / 4 answered

1.At a handoff, which context-passing strategy is the framework-recommended default for most production systems?

2.What does the 2025 blackboard-system research (arXiv:2510.01285) demonstrate about shared-state coordination?

3.In a LangGraph multi-agent subgraph, what must you pass to the receiving agent for a tool-style handoff to be valid?

4.According to the MAST taxonomy, what most often causes multi-agent systems to fail?

Go deeper

Hand-picked sources to keep learning

OpenAI Agents SDK — Handoffs Documentation

Official 2025 docs on the handoffs primitive: input_filter, on_handoff callbacks, and structured handoff input via Pydantic.

LangChain / LangGraph — Multi-Agent Handoffs

Conditional edges vs. Command objects, with the warning about message-history completeness requirements.

Why Do Multi-Agent LLM Systems Fail? (MAST) — arXiv:2503.13657

Berkeley study introducing the 14-failure-mode taxonomy across 1,600+ traces; the authoritative empirical account of multi-agent failure.

LLM-Based Multi-Agent Blackboard System — arXiv:2510.01285

Applies the blackboard pattern to LLM agents; 13–57% improvement over master–slave coordination on data-science tasks.

How We Built Our Multi-Agent Research System — Anthropic Engineering

First-hand account of context-management failures, filesystem-as-shared-state, and why precise task/output specs are essential.

How Agent Handoffs Work in Multi-Agent Systems — Towards Data Science

Accessible walkthrough of the three context-passing strategies and their cost/reliability trade-offs.