Multi-Agent Orchestration at Scale/Lesson 3 of 6

Agent Teams: Coordinated Peer Sessions

A lead, teammates, a shared task list, and a mailbox

Advanced 15 minBuilder

What you'll be able to do

Enable the experimental agent-teams system and explain how a lead session spawns independent teammate sessions, each with its own full fresh context window
Describe the architecture — fixed lead, per-role teammates, a shared task list with auto dependencies and file locking, and a mailbox for real inter-agent messaging — and its on-disk persistence
Choose between in-process and tmux split-pane display modes and know which terminals support each
Drive a team in natural language: create it, spawn teammates from a subagent definition, assign and claim tasks, talk to a teammate, and clean up FROM THE LEAD
Apply the TeammateIdle / TaskCreated / TaskCompleted quality-gate hooks and reason about the linear cost model and known limitations

At a glance

Agent teams are Claude Code's experimental system for coordinating MANY independent sessions as peers. A lead session spawns teammate sessions — each with its own full, fresh context window — that self-coordinate through a shared task list and a real inter-agent mailbox. This is the multi-SESSION collaboration that subagents and workflows deliberately do NOT provide: teammates actually message each other and claim shared tasks. The lesson covers enabling the experiment, the lead+teammates+task-list+mailbox architecture, the two display modes, spawning teammates from subagent definitions, driving and cleaning up a team in natural language, quality-gate hooks, and the linear cost model with its known limitations.

1Beyond one subagent: peers that actually talk
2Enabling the experiment
3Architecture and persistence: lead, teammates, task list, mailbox
4Two display modes: in-process vs. tmux split panes
5Driving a team: all natural language to the lead
6Quality-gate hooks: enforce 'done' means done
7Cost model and known limitations

Beyond one subagent: peers that actually talk

You already know subagents: helpers that run inside your one session, do isolated work, and report a summary back. You may also know dynamic workflows: a script that fans hundreds of subagents out, holding intermediate results in script variables. Both share one limitation by design — the agents never talk to each other. A subagent hands a summary up to the parent; a workflow stitches strings together with pure concatenation. There is no message bus in either.

Agent teams is the paradigm that breaks that rule. It is multi-session peer collaboration: a lead session (your main Claude Code session) spawns several independent teammate sessions, each a separate Claude Code instance with its own full, fresh context window. They coordinate through two shared resources you don't get anywhere else: a shared task list they claim work from, and a mailbox through which they send each other real messages. Teammates self-coordinate around the tasks rather than reporting up to a parent.

text

        ┌──────────────────────────────────────────┐
        │  LEAD session (you drive this one)         │
        └───────────────┬────────────────────────────┘
           spawns        │   shared task list  +  mailbox
        ┌────────────────┼─────────────────┐
        ▼                ▼                 ▼
  ┌───────────┐   ┌───────────┐    ┌───────────┐
  │ Teammate A│   │ Teammate B│    │ Teammate C│   each = full,
  │ full ctx  │◄─►│ full ctx  │◄──►│ full ctx  │   fresh context
  └───────────┘   └───────────┘    └───────────┘   window
      claim tasks · message each other · self-coordinate

The trade is plain: every teammate is a full context window, so this is the most expensive paradigm per agent — but it's the only one where agents divide a fuzzy problem and hand off to one another. That's the thing to reach for when the work isn't a clean fan-out but a genuine collaboration.

Watch out

Experimental, off by default — defer to /help and the live docs

Agent teams is an experimental feature, disabled by default, requiring Claude Code v2.1.32+. Its behavior, flags, and exact tool names shift week to week. Treat everything here as a snapshot: for your installed version, always confirm with /help and the live docs (https://code.claude.com/docs/en/agent-teams.md). Do not run it for anything load-bearing without checking current behavior first.

Key insight

This is the thing workflows deliberately DON'T do

Dynamic workflows scale to ~1,000 agents but communicate only by string concatenation — no peer messaging, on purpose, to keep cost and coordination simple. Agent teams trades that scale away (3-5 teammates) to buy real inter-agent messaging and a shared task list. Pick teams when agents must talk; pick a workflow when they only need to fan out.

Enabling the experiment

Because agent teams is experimental and off by default, you must opt in explicitly. Set the environment variable, either in your shell or in a settings.json env block:

bash

# Shell (current session)
export CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS=1

json

// ~/.claude/settings.json — persistent across sessions
{
  "env": {
    "CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS": "1"
  }
}

That's the whole switch. Once enabled, your normal Claude Code session becomes capable of acting as a lead: you ask it (in plain English) to create a team and spawn teammates, and it provisions the independent sessions for you. The variable gates the feature entirely — with it unset, none of the team tools or commands exist.

Require v2.1.32+; check with claude --version. If the team commands don't appear after setting the variable, you're likely on an older build.

Tip

settings.json env block vs. exporting

export lasts only for the current shell. Putting the variable in the env block of ~/.claude/settings.json (user) or .claude/settings.json (project) makes it persistent and, for the project file, shareable with collaborators via git. Use the env block if you want teams available by default; use export for a one-off experiment.

Architecture and persistence: lead, teammates, task list, mailbox

A team has four moving parts, three of which live on disk so independent sessions can share them.

The lead. Your main session. It is fixed — there is no promoting a teammate to lead and no transferring lead to another session. The lead provisions teammates, can assign tasks, and is the only session allowed to clean the team up.

The teammates. Separate Claude Code instances, one per role. Each has its own complete context window and runs independently — it is not a subagent living inside the lead.

The shared task list. Stored at ~/.claude/tasks/{team-name}/. Tasks move through three states — pending → in-progress → completed. Dependencies are managed automatically (a task blocked on another won't be claimable until its prerequisite completes), and file locking prevents two teammates from grabbing the same task at once.

The mailbox. The inter-agent messaging channel — how teammates and the lead actually send each other text, not just status. This is the piece that makes a team a team.

Configuration lives at ~/.claude/teams/{team-name}/config.json. It is auto-generated — its members array tracks each teammate's name, agent ID, and agent type. Do not hand-edit it; let the lead manage it. And there is no nesting: a teammate cannot spawn its own team or further teammates. One lead, one flat team.

text

~/.claude/
├── tasks/
│   └── {team-name}/        ← shared task list (pending/in-progress/completed)
│                              auto dependencies + file locking
└── teams/
    └── {team-name}/
        └── config.json     ← AUTO-GENERATED members array; do NOT hand-edit

Note

File locking is what makes self-claiming safe

Because teammates run as independent processes, two of them could race to claim the same pending task. The shared task list uses file locking to serialize claims, so a task goes to exactly one teammate. This is also why the task list lives on disk under ~/.claude/tasks/{team-name}/ rather than in any single session's memory — it's the shared substrate all the sessions read and write.

Two display modes: in-process vs. tmux split panes

Because teammates are separate sessions, you need a way to see and talk to them. There are two display modes, plus an auto default that picks for you.

in-process — all teammates run inside your main terminal. You cycle between them with Shift+Down. This mode works anywhere, in any terminal, with no extra tooling. It's the safe default.

tmux split panes — each teammate gets its own visible pane; you click a pane to interact with that teammate. This needs a terminal multiplexer: tmux, or iTerm2 with the it2 CLI installed. It's the richer view when you want to watch several teammates at once.

auto (the default) uses split panes if you're already inside tmux, and falls back to in-process otherwise.

Set the mode with teammateMode in settings.json, or the --teammate-mode in-process|tmux flag at launch.

Mode	How you switch teammates	Requirements	Works everywhere?
in-process	Shift+Down cycles	none	Yes
tmux split panes	click a pane	tmux, or iTerm2 + `it2` CLI	No
auto (default)	split if inside tmux, else in-process	—	falls back gracefully

Split-pane mode is NOT supported in the VS Code integrated terminal, Windows Terminal, or Ghostty. In those, use in-process (or run inside tmux).

Watch out

Split panes won't work in VS Code, Windows Terminal, or Ghostty

If you're in the VS Code integrated terminal, Windows Terminal, or Ghostty, tmux split-pane mode is unsupported — set teammateMode to in-process (or --teammate-mode in-process) and use Shift+Down to cycle. The in-process mode is deliberately universal precisely so the feature works regardless of your terminal.

Driving a team: all natural language to the lead

You don't script a team — you talk to the lead in plain English, and it provisions and coordinates the teammates. The vocabulary:

Create + spawn. "Create an agent team to migrate our API to v2. Spawn three teammates: a migrator, a test-writer, and a reviewer." The lead sets up the team and the independent sessions.

Spawn from a subagent definition. You can name an existing subagent type: "Spawn a teammate using the security-reviewer agent type." When you do, the definition's system prompt + tools allowlist + model carry over to the teammate. But there's a sharp caveat (below): the definition's skills and mcpServers do NOT apply — teammates load those from project/user settings instead. Team-coordination tools (sending messages, task management) are always available to a teammate even if the definition restricts its tools.

Assign or claim. The lead can assign a task to a specific teammate, or teammates can self-claim any unblocked (dependency-satisfied) task from the shared list.

Talk to a teammate. Switch to it — Shift+Down in in-process mode, or click its pane in split mode — and type.

Shut down gracefully. The lead sends a shutdown request; the teammate approves its own exit (this can be slow — see limitations).

Clean up — from the lead. "Clean up the team." This removes the shared resources (task list, config, sessions) consistently. Cleanup MUST run from the lead, not a teammate — running it elsewhere won't tear things down cleanly.

Watch out

Subagent definition → teammate: skills and mcpServers are DROPPED

Spawning a teammate from a subagent definition carries over its system prompt, tools allowlist, and model — but NOT the definition's skills or mcpServers frontmatter. Teammates load skills and MCP servers from project/user settings instead. If your security-reviewer subagent relied on a skill or an MCP server declared in its own frontmatter, that capability silently won't be present in the teammate. Put it in project/user settings, or don't assume it's there.

Example

A three-teammate migration team, start to finish

To the lead: "Create an agent team for the v2 API migration. Spawn a migrator teammate, a teammate using the test-writer agent type, and a teammate using the security-reviewer agent type. Migrator: convert the endpoints in src/api/. Test-writer: claim each migrated endpoint and add coverage. Security-reviewer: review every change for auth regressions." Watch progress, Shift+Down to nudge a stalled teammate, then end with "Clean up the team" — issued to the lead.

Quality-gate hooks: enforce 'done' means done

Left alone, agents declare victory once work looks finished. Agent teams give you three team-specific hooks to wire deterministic gates around the task lifecycle. Each is a shell command; exit code 2 is the lever — it sends feedback and either blocks the transition or keeps a teammate working.

Hook	Fires when	Exit code 2 effect
TeammateIdle	a teammate is about to go idle	sends feedback and keeps it working (don't stop yet)
TaskCreated	a task is created	prevents creation + returns feedback
TaskCompleted	a task is marked complete	blocks completion + returns feedback (e.g. demand tests/docs first)

The canonical use is TaskCompleted as a quality gate: a teammate tries to mark a task done; your hook runs the test suite (or a security scan, or a docs check); if it fails, the hook exits 2, the completion is blocked, and the teammate gets told what to fix. "Done" now provably means tests pass — not just that the teammate believes it's finished.

bash

#!/usr/bin/env bash
# TaskCompleted hook: block 'done' unless tests pass
if ! npm test --silent; then
  echo "Tests failing — task is not complete. Fix and re-run." >&2
  exit 2   # blocks completion + sends this feedback to the teammate
fi
exit 0

TeammateIdle is the counterpart for the other failure mode — a teammate quitting while there's still claimable work; exit 2 nudges it back to the task list.

Key insight

Hooks are the deterministic backbone under a best-effort team

Teammates are model-driven and best-effort — they might skip a check. The three team hooks are deterministic: they always fire on their lifecycle event, and exit 2 is non-negotiable. That's how you convert 'please write tests before marking done' from a hope into a guarantee. If a step MUST happen before 'done', encode it in a TaskCompleted hook, not in a prompt.

Cost model and known limitations

Cost scales linearly with concurrent teammates. Every teammate is a full context window running independently, so two teammates cost roughly twice one, three cost three times, and so on — there's no context-sharing discount the way subagents (summaries) or workflows (script variables) give you. This makes teams the most expensive per-agent paradigm. The practical sizing: 3-5 teammates, ~5-6 tasks each, and don't leave a team running unattended — costs accrue while you're away and there's no auto-stop.

How far can it go? Anthropic's engineers used coordinated Claude Code sessions to build a C compiler — roughly 2,000 sessions over about two weeks, ~$20,000 in API cost, producing a ~100,000-line compiler that builds Linux on x86/ARM/RISC-V. Treat that as an order-of-magnitude anecdote, not a target: it shows the ceiling of what coordinated sessions can do, and how the cost scales when you push it.

Known limitations (be honest with yourself):

It's experimental — expect rough edges.
/resume and /rewind do NOT restore in-process teammates — restoring a session won't bring its teammates back.
Task status can lag — teammates sometimes fail to mark a task complete; nudge them manually.
Shutdown is slow — graceful exit requires the teammate to approve, which takes time.
One team at a time per lead — you can't run multiple teams from one lead.
No nested teams, no lead transfer — the structure is flat and the lead is fixed.

Watch out

Don't leave a team running unattended

Because cost scales linearly and there's no auto-stop, an idle or stuck team keeps burning a full context window per teammate. Size to 3-5 teammates with ~5-6 tasks each, stay attended while it runs, and finish with Clean up the team from the lead. The ~$20k C-compiler figure is what sustained, large-scale coordination costs — a useful reminder that this paradigm bills per full session.

Tip

When task status lags, nudge by hand

A common rough edge: a teammate finishes work but doesn't flip its task to completed, so dependents stay blocked. If progress stalls, Shift+Down (or click the pane) to that teammate and ask it to mark the task complete. Pair this with a TaskCompleted hook so 'complete' still means your gate passed.

Try it: Run a small, attended agent team end to end

Practice coordinating peers on a throwaway git repo. Keep it small and STAY ATTENDED — cost scales linearly per teammate.

Enable the experiment. Confirm claude --version is v2.1.32+. Add CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS=1 to the env block of your ~/.claude/settings.json (or export it). Start claude and verify team behavior is available; if anything looks off, check /help and the live agent-teams docs for your version.
Pick a display mode. If you're in VS Code, Windows Terminal, or Ghostty, set --teammate-mode in-process (Shift+Down to cycle). If you have tmux or iTerm2 with it2, try split panes.
Create the team. To the lead: "Create an agent team to add input validation across src/. Spawn two teammates: a validator and a reviewer." Spawn the reviewer from an existing subagent type if you have one — note that its skills/mcpServers won't carry over.
Watch coordination. Observe tasks move pending → in-progress → completed in ~/.claude/tasks/{team}/. Let teammates self-claim. Shift+Down (or click a pane) to talk to one directly.
Add a quality gate. Wire a TaskCompleted hook that runs your test command and exits 2 on failure. Have a teammate try to mark a task done with a failing test — confirm the completion is blocked and feedback is returned.
Handle a lag. If a teammate finishes but doesn't mark its task complete, nudge it by hand (Shift+Down → "mark your task complete").
Clean up — from the lead. Issue "Clean up the team" to the LEAD session and confirm the shared resources under ~/.claude/tasks/ and ~/.claude/teams/ are removed.

Deliverable: a short note on (a) one thing the mailbox/task-list enabled that a single subagent could not have done, (b) what your TaskCompleted hook blocked and why that's better than a CLAUDE.md instruction, and (c) one limitation you hit (e.g. lagging status or slow shutdown).

Key takeaways

1Agent teams is experimental and off by default (v2.1.32+); enable it with CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS=1 (shell export or a settings.json env block), and defer to /help and the live docs for your version.
2A lead session spawns independent teammate sessions, each with its OWN full fresh context window — this is multi-SESSION peer collaboration, not subagents inside one context.
3Teammates self-coordinate through a shared task list at ~/.claude/tasks/{team} (pending/in-progress/completed, auto dependencies, file locking) and a mailbox for real inter-agent messaging; config at ~/.claude/teams/{team}/config.json is auto-generated — don't hand-edit it; no nesting, no lead transfer.
4Two display modes: in-process (Shift+Down cycles, works anywhere) and tmux split panes (needs tmux or iTerm2 it2; NOT in VS Code terminal, Windows Terminal, or Ghostty); set via teammateMode or --teammate-mode.
5Drive in natural language to the lead — create, spawn (a subagent type's prompt/tools/model carry over, but its skills & mcpServers do NOT), assign/claim, talk, and shut down; 'Clean up the team' MUST run from the lead.
6Cost scales linearly per teammate (each a full window) — keep to 3-5 teammates and ~5-6 tasks each, don't run unattended; gate quality with TeammateIdle/TaskCreated/TaskCompleted hooks (exit 2 blocks/keeps working).

Quiz

Lock in what you learned

Check your understanding

0 / 4 answered

1.What most fundamentally distinguishes agent teams from subagents and dynamic workflows?

2.You spawn a teammate using your `security-reviewer` subagent definition, which declares a custom skill and an MCP server in its frontmatter. Which of its settings actually apply to the teammate?

3.You're running Claude Code in the VS Code integrated terminal and want to coordinate three teammates. What's the correct display setup?

4.A teammate marks its task complete, but you want 'complete' to mean the test suite passes. What's the right mechanism, and how does it behave?

Go deeper

Hand-picked sources to keep learning

Agent teams (official docs)

The primary source: enabling the experiment, lead/teammates/task-list/mailbox architecture, display modes, driving and cleaning up a team, the team hooks, and known limitations.

Subagents (official docs)

Subagent definitions — system prompt, tools, model, skills, mcpServers frontmatter — and which fields carry over when a definition is spawned as a teammate.

Building a C compiler with Claude Code (Anthropic Engineering)

The real-world scale anecdote: ~2,000 coordinated sessions over ~2 weeks, ~$20k API, a ~100k-line compiler that builds Linux on x86/ARM/RISC-V.

Hooks reference (official docs)

How lifecycle hooks work and the exit-code-2 convention that the TeammateIdle / TaskCreated / TaskCompleted quality gates rely on.

Settings (official docs)

The settings.json env block (for CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS) and teammateMode; where teammates load skills/MCP servers from.

claude-code on GitHub

Source, changelog, and issues — the place to confirm agent-teams behavior and flags for your installed version, since the feature is experimental and fast-moving.