How Agents Communicate
Messages, shared state, and handoffs
- Distinguish the three core communication mechanics — direct message passing, shared state/blackboard, and explicit handoffs — and when each fits
- Implement a handoff in a real framework (LangGraph Command, OpenAI Agents SDK handoff tool) with the correct message history
- Choose a context-passing strategy (full history, structured object, or summary) and reason about its cost and reliability trade-offs
- Pass enough context for the receiving agent to succeed without bloating the context window
- Diagnose the dominant real-world failure modes from the MAST taxonomy — inter-agent misalignment, context loss, and duplicated work
When you split a job across multiple agents, the hard part is no longer the reasoning inside any one agent — it is what passes between them. This lesson covers the three mechanics of inter-agent communication (message passing, shared state, and handoffs), the critical choice of how much context to transfer at each boundary, and why most real multi-agent failures are miscommunication, not model weakness.
- 1The bottleneck is the wire, not the agent
- 2Message passing: agents talk to agents
- 3Shared state and the blackboard
- 4Handoffs: transferring control and context
- 5Passing enough context without bloat
- 6How miscommunication breaks systems
The bottleneck is the wire, not the agent
In a single-agent system, everything the model needs lives in one context window. The moment you split work across agents, you introduce a boundary — and every boundary is a place where information must be packaged, transmitted, and unpacked. The quality of your multi-agent system is decided far more by what crosses those boundaries than by how clever any individual agent is.
This is counterintuitive. Teams reach for multi-agent architectures expecting more capability, then discover their bottleneck is coordination. The 2025 MAST study (UC Berkeley et al., 1,600+ annotated traces across 7 frameworks) found that the dominant causes of failure were inter-agent misalignment and specification issues — not raw model capability. Agents reset conversations, propagated wrong assumptions, withheld information, and duplicated each other's work.
Three mechanics carry information between agents: message passing (an agent sends a structured message to another), shared state (agents read and write a common workspace), and handoffs (an agent transfers control, along with context, to a specialist). Most production systems use all three. Master these and you control the variable that actually determines whether a multi-agent system works.
Key insight
The reframe
Adding agents does not automatically add capability — it adds communication boundaries. Each boundary is a potential point of context loss. Design the boundaries first; the agents are the easy part.
Message passing: agents talk to agents
The most common mechanic is message passing: one agent sends another a structured message — natural language plus typed metadata — either directly (as a function/tool call) or through an orchestrator that routes it. This mirrors human delegation: a manager hands a worker a task with the relevant details attached.
The message is rarely just raw text. It carries metadata the receiver needs to act: the task, constraints, priority, and a place to put the result. A clean message contract is what keeps a system debuggable.
from pydantic import BaseModel
from typing import Literal
class AgentMessage(BaseModel):
sender: str
recipient: str
task: str # what to do
context: dict # only the relevant fields
priority: Literal["low", "normal", "high"] = "normal"
reply_to: str | None = None # where the result goes
msg = AgentMessage(
sender="orchestrator",
recipient="research_agent",
task="Find the 2026 pricing for competitor X's enterprise tier",
context={"company": "X", "region": "EU"},
priority="high",
)Frameworks differ in framing. AutoGen (and its 2025 successor, Microsoft Agent Framework) treats agents as conversational participants exchanging chat messages in topologies like round-robin or nested group chats. OpenAI's Agents SDK and LangGraph treat inter-agent calls as tool invocations. Both are message passing; the difference is whether the model sees a conversation or a function call.
Tip
Type your messages
Define a message schema (Pydantic, Zod, a TypedDict) and validate it at every boundary. A typed contract turns a whole class of silent miscommunication bugs into loud, catchable validation errors.
Handoffs: transferring control and context
A handoff is a special move: one agent doesn't just send a message, it transfers control to a specialist, along with the context that specialist needs. A triage agent recognizes a refund request and hands the conversation to the Refund Agent. The defining question of a handoff is not which agent receives control — it is how much context travels with it. That choice (next section) is the single biggest design decision in this lesson.
Frameworks implement handoffs differently:
- OpenAI Agents SDK (March 2025) exposes handoffs as tools the LLM can call. A handoff to the Refund Agent becomes a callable
transfer_to_refund_agenttool. It providesinput_filterto prune history,on_handoffcallbacks to capture structured metadata (reason, language, priority), and a recommended prompt prefix to steer the model. - LangGraph uses either conditional edges (
add_conditional_edges()for static routing) or aCommandobject that updates state and names the next agent in one move, enabling dynamic handoffs.
from langgraph.types import Command
from typing import Literal
def triage_agent(state) -> Command[Literal["refund_agent", "support_agent"]]:
intent = classify(state["messages"][-1])
target = "refund_agent" if intent == "refund" else "support_agent"
return Command(
goto=target,
update={"handoff_reason": intent, "priority": "high"},
)The Command both routes control and records why the handoff happened — exactly the metadata the next agent needs.
Watch out
LangGraph: pass BOTH messages
When implementing a handoff in a LangGraph subgraph, you must pass both the AIMessage containing the tool call and a matching ToolMessage (same tool_call_id) to the receiving agent. Omit either and the receiver sees a malformed conversation history and behaves unpredictably. This is one of the most common implementation bugs.
Passing enough context without bloat
Think of a handoff like briefing a colleague who is taking over your task. You can dump every email you ever exchanged on them, hand them a tidy one-page summary, or fill out a short form with just the facts they need. The middle path is usually right — and the same is true for agents. Here is the decision that makes or breaks a multi-agent system: at each handoff, how much context do you transfer? There are three strategies, each with a sharp trade-off.
| Strategy | Token cost | Reliability | Use when |
|---|---|---|---|
| Full history forwarding | High | High fidelity, but degrades with length | Short chains where every detail matters |
| Structured context object | Low | High — receiver gets exactly the typed fields it needs | Default for most production systems |
| LLM-generated summary | Very low (70–90% reduction) | Lossy; adds latency | Long histories where detail is dispensable |
The naive default — forward the entire conversation — is deceptively expensive. A 50-message thread passed through four handoffs means the fifth agent processes roughly 200 messages, most irrelevant. Worse, long contexts dilute attention and reduce reliability. The framework-recommended approach is the structured context object: the orchestrator passes only the typed fields the receiver needs.
Context isolation is now a first-class principle. Anthropic's system spawns fresh subagents with clean contexts and maintains continuity through carefully written task descriptions — not full-history dumps. Each agent gets only what it needs to do its job.
# Structured object: pass relevant fields, not the whole transcript.
handoff_context = {
"task": "Process refund for order #4471",
"customer_tier": "enterprise",
"order_total": 1299.00,
"reason": "defective unit, verified",
}
# vs. forwarding 50 messages of small talk and unrelated tickets.Watch out
"Full history just works" is a myth at scale
Anthropic had to invent external filesystem state plus lightweight references precisely because forwarding full context between agents caused context-window overflow and compounding errors. Default full-history forwarding is expensive and often counterproductive.
How miscommunication breaks systems
When a multi-agent system fails, the instinct is to blame the model — "it just isn't smart enough." That instinct is almost always wrong. The agents reason fine in isolation; they break at the seams between them, the same way a capable team fails when nobody writes things down or two people unknowingly do the same job. Most multi-agent failures are orchestration failures, not capability failures. The MAST taxonomy (arXiv:2503.13657) catalogued 14 failure modes across three clusters — specification issues, inter-agent misalignment, and task verification — from 1,600+ traces (κ=0.88). The communication-specific ones recur everywhere:
- Wrong assumptions propagated — agent A states something uncertain as fact; agent B builds on it; the error compounds downstream.
- Information withholding — an agent knows something relevant but never surfaces it across the boundary.
- Conversation reset / context loss — a handoff drops the thread; the receiver re-derives or contradicts prior work.
- Task derailment & duplicated work — vague task boundaries cause two agents to do the same thing, or drift off-goal. Anthropic's own system once spawned 50 subagents for a simple query.
A subtler 2026 finding (Silo-Bench) named the Communication–Reasoning Gap: agents can exchange information and form the right coordination topology, yet still fail because they don't correctly integrate the distributed state into the final answer. The wire worked; the synthesis didn't.
The fixes are mostly about discipline at the boundary: explicit, detailed task descriptions with output-format specs; typed message contracts that fail loud; passing structured context, not raw history; and clear task boundaries so agents don't overlap. Communication quality is an engineering practice, not a model property.
Tip
Specify the output, not just the task
Anthropic found that telling a subagent what format and shape of result to return — not just what to do — was essential to avoid duplicated and derailed work. A precise output contract is the cheapest reliability upgrade you can ship.
Try it: Build a triage handoff with a structured context object
Build a two-agent system in LangGraph or the OpenAI Agents SDK: a Triage Agent that classifies an incoming customer message and hands off to either a Refund Agent or a Support Agent.
Requirements:
- Define a typed handoff context (Pydantic/TypedDict) with
task,reason,priority, and only the order/customer fields the receiver needs — do not forward the full conversation transcript. - Implement the handoff: in LangGraph use a
Command(goto=..., update=...); in the Agents SDK expose handoffs as tools and capture metadata in anon_handoffcallback. - Run two scenarios (a refund request and a how-to question) and log, for each, the exact context object that crossed the boundary.
Then run the same flow a second way, forwarding the entire message history instead of the structured object. Compare token counts for both. Write three sentences: how many tokens you saved, and one failure you can imagine the full-history version causing at scale (e.g., context overflow after several handoffs, or the receiver acting on irrelevant prior tickets).
Key takeaways
- 1Three mechanics carry information between agents — message passing, shared state/blackboard, and handoffs — and production systems usually combine all three.
- 2The biggest design decision is the context-passing strategy at each handoff: full history (expensive), structured object (recommended default), or summary (lossy).
- 3Forwarding full conversation history does not scale — it inflates cost, dilutes attention, and reduces reliability; pass only the typed fields the receiver needs.
- 4The blackboard/shared-state pattern is modern, not legacy: it decouples agents and beat central coordination by 13–57% on 2025 data-science benchmarks.
- 5Most multi-agent failures are miscommunication — wrong assumptions, withheld info, context loss, duplicated work — not model capability gaps.
Quiz
Lock in what you learned
Check your understanding
0 / 4 answered
1.At a handoff, which context-passing strategy is the framework-recommended default for most production systems?
2.What does the 2025 blackboard-system research (arXiv:2510.01285) demonstrate about shared-state coordination?
3.In a LangGraph multi-agent subgraph, what must you pass to the receiving agent for a tool-style handoff to be valid?
4.According to the MAST taxonomy, what most often causes multi-agent systems to fail?
Go deeper
Hand-picked sources to keep learning
Official 2025 docs on the handoffs primitive: input_filter, on_handoff callbacks, and structured handoff input via Pydantic.
Conditional edges vs. Command objects, with the warning about message-history completeness requirements.
Berkeley study introducing the 14-failure-mode taxonomy across 1,600+ traces; the authoritative empirical account of multi-agent failure.
Applies the blackboard pattern to LLM agents; 13–57% improvement over master–slave coordination on data-science tasks.
First-hand account of context-management failures, filesystem-as-shared-state, and why precise task/output specs are essential.
Accessible walkthrough of the three context-passing strategies and their cost/reliability trade-offs.