Anatomy of an Agent

The four parts under every framework

Beginner 13 minBuilderDecision-maker

What you'll be able to do

Name and explain the five components of an agent — model, instructions, tools, memory, and orchestration — and what each one is responsible for
Trace a single user request as it flows through all five components in order
Explain why separating these components cleanly makes an agent debuggable and testable
Map the anatomy onto real frameworks (LangGraph, CrewAI, OpenAI Agents SDK, Google ADK)
Diagnose which component is at fault when an agent misbehaves

At a glance

An agent is not a model — it is a system with five interlocking parts: a model, instructions, tools, memory, and an orchestration loop that ties them together. This lesson dissects each part, traces how a single request flows through them, and shows that the same anatomy lives under every framework you'll meet, from LangGraph to the OpenAI Agents SDK. Once you can name the parts, you can debug them, because each one fails in its own distinct way.

1An agent is a system, not a model
2The five components
3How a request flows through the parts
4Separation of concerns: why clean parts are debuggable
5The same anatomy, every framework
6Where each component goes wrong

An agent is a system, not a model

Start with the trap almost everyone falls into: believing the model is the agent. It isn't. The model is one part — the reasoning core — and on its own it can only produce text. What turns a model into an agent is the machinery wrapped around it that lets it act, remember, and keep going until a goal is met.

The cleanest way to picture this is a body. The model is the brain: it decides. But a brain in a jar can't do anything. It needs instructions (its values and goals), tools (hands to act on the world), memory (to carry context from one moment to the next), and an orchestration loop (the nervous system that connects perception to action and back). Take any part away and the agent stops being an agent — no tools and it can only talk; no loop and it acts once and halts.

This framing isn't academic. As of 2026 it maps cleanly onto every major framework and even onto the formal research literature, where an agent is written as a tuple of policy, memory, tools, verifiers, and environment. The practical payoff is large: when you see an agent as five separable parts rather than one inscrutable blob, you can reason about it, test each piece, and find bugs fast.

The five components

In plain terms: each component answers one question. Who is doing the thinking? The model. What is it trying to do and how should it behave? The instructions. What can it actually touch in the world? The tools. What does it remember? The memory. What keeps it running and stops it? The orchestration. Hold those five questions in your head and you have the whole anatomy.

Here it is in one table. Internalize it — the rest of the course is detail on each row.

Component	What it is	Its one job
Model	The LLM at the core	Map the current context to the next decision (think or act)
Instructions	System prompt + policy + tool descriptions	Tell the model who it is, what to pursue, and how to behave
Tools	Typed, schema-validated functions	Let the agent sense and change the outside world
Memory	Context window → conversation store → long-term knowledge	Carry information across steps and across runs
Orchestration	The runtime loop	Run the perceive→reason→act→observe cycle, enforce budgets, handle errors

Three subtleties to notice. Instructions are broader than "the system prompt" — in production they include tool descriptions, injected examples, and any retrieved context, because all of it shapes the model's behavior. Memory is layered, not a single thing: the immediate context window, a conversation store, and long-term knowledge are three different mechanisms. And orchestration is far more than a while loop — it owns stopping conditions, token and cost budgets, retries, and loop detection. Each row is a place where an agent can succeed or fail independently of the others.

Key insight

Why "instructions" deserve their own box

Anthropic's research found that the system-prompt/instructions layer has a disproportionate effect on reliability: vague instructions are a leading cause of goal drift and hallucinated actions. Instructions aren't decoration on top of the model — they are the agent's policy.

How a request flows through the parts

So far the five parts look like a static diagram. They aren't — they're a pipeline a single request runs through, often many times in a single turn. Picture passing a baton: each component does its job, then hands the growing message history to the next. Here is the exact path:

Instructions are injected into the context window: the system prompt, the available tool schemas, and any relevant memory.
The model reasons over that context and emits one of two things: a final text answer, or a tool call (a structured request like search(query="...")).
If it's a tool call, the orchestrator executes the tool and captures the result.
The result is appended to memory (the message history) so the model can see what happened.
The loop repeats from step 2, now with the new observation in context.

It stops when the model produces a text-only answer (it's done) or when a budget guard fires — max iterations, max tokens, max cost, or a timeout. That stopping logic lives in orchestration, not in the model, which is exactly why it's reliable: you never trust a non-deterministic model to decide when to quit.

Notice how each component touches the request in turn, hands off cleanly, and reads from a shared, growing message history. That history is the working memory, and keeping it clean is half the battle in real agents.

Example

One turn, concretely

Goal: "What's the weather in Tokyo, and should I pack an umbrella?"

Instructions + the get_weather tool schema enter the context.
Model decides it needs data → emits get_weather(city="Tokyo").
Orchestrator runs the tool → {"temp": 14, "rain_chance": 0.8}.
Result appended to history.
Model loops, now sees 80% rain, and produces a text answer: "14°C and likely rain — pack the umbrella." Text-only response → loop stops.

Separation of concerns: why clean parts are debuggable

Why bother keeping these five parts distinct? Because when something breaks, you want to know which thing broke — not stare at a wall of logs guessing. That's the whole architectural payoff: clean separation makes failures isolatable. A tool failure is not a model failure is not a memory failure. When each component has defined inputs and outputs, you can point at the broken one.

The practical test is mockability: if your parts are cleanly separated, you can swap any one for a fake and test the rest. A "mock" here just means a stand-in that returns a fixed, predictable result instead of doing the real work.

python

# Test the orchestration loop without spending tokens or hitting APIs
def fake_model(messages):
    # Always asks for the calculator once, then answers
    if not any(m["role"] == "tool" for m in messages):
        return ToolCall("calculator", {"expr": "2+2"})
    return FinalAnswer("The result is 4.")

def fake_calculator(expr):
    return "4"

# Now exercise the loop, budget guards, and history handling in isolation
run_agent(model=fake_model, tools={"calculator": fake_calculator}, goal="2+2?")

Because run_agent only depends on interfaces (a callable model, a dict of tools), you can verify stopping conditions and history management with zero network calls. The same logic lets you mock a flaky tool to test error handling, or replay a fixed message history to test the model's behavior deterministically. Agents are non-deterministic; clean seams are how you make them testable anyway.

The same anatomy, every framework

Here's the news that makes every framework suddenly easy: they are all the same five parts in different clothes. Each framework is just an opinionated way of assembling model, instructions, tools, memory, and orchestration. The vocabulary changes; the anatomy doesn't.

Framework	Model	Instructions	Tools	Memory	Orchestration
LangGraph	any LLM	node prompts	tool nodes	shared `state` object + checkpoints	directed graph (nodes + conditional edges)
CrewAI	any LLM	role / goal / backstory	agent tools	crew memory (pluggable vector store)	sequential / hierarchical process
OpenAI Agents SDK	OpenAI-first	agent instructions + guardrails	function tools	sessions	the runner + handoffs between agents
Google ADK	Gemini-first	agent config	tool registry	built-in state	workflow agents (Sequential / Parallel / Loop)
AG2 / AutoGen	any LLM	conversable-agent system message	registered functions	conversation history	structured agent conversations

The lesson is liberating: learn the anatomy once and every framework becomes legible. Open the OpenAI Agents SDK and see Agents, Handoffs, Guardrails, Sessions, Tracing, and you can immediately place each one — Guardrails and Sessions are instructions and memory; Handoffs are an orchestration pattern. Frameworks earn their keep by providing the hard parts — state management, checkpointing, tracing, human-in-the-loop — that are tedious to build correctly. Anthropic's caution isn't "avoid frameworks"; it's "don't use them as black boxes you can't reason about."

Watch out

Two names that trip people up in 2026

OpenAI's production agent framework is the Agents SDK (released March 2025) — Swarm was an experimental prototype and is deprecated; the Swarm README explicitly directs users to migrate. And AutoGen entered maintenance mode at Microsoft in late 2025 as Microsoft pivoted to the Microsoft Agent Framework; the active community continuation of AutoGen 0.2 is AG2, maintained by the original creators outside Microsoft. Citing "Microsoft AutoGen" (active) or "OpenAI Swarm" (current) is outdated.

Where each component goes wrong

Now the real reason to learn the anatomy: debugging. When an agent misbehaves, the symptom usually points straight at one component — because each part fails in its own characteristic way. Learn the signatures and you stop guessing.

Model — hallucination, goal drift, and compounding errors that grow over many turns. Research on long-horizon tasks shows context reliability degrades significantly as the number of steps increases, with earlier instructions and observations becoming progressively less influential by the final turn.
Instructions — ambiguous or conflicting goals that cause wrong actions or, worse, infinite loops where the agent can never decide it's done.
Tools — malformed arguments, schema mismatches, API timeouts, and dangerous side effects. Anthropic's “Building Effective Agents” guide found that teams spent more time on tool schemas than on prompts; typed (Pydantic/JSON Schema) arguments sharply cut malformed calls.
Memory — context overflow, stale embeddings, low-quality retrieval, and prompt injection hidden inside retrieved content.
Orchestration — missing budget guards causing runaway loops, routing errors in multi-agent setups, and race conditions.

So when an agent misbehaves, resist debugging "the agent." Ask: which component? A wrong-but-confident answer points at the model or instructions; a crash on a tool result points at tools or schemas; an agent that forgets what it did points at memory; an agent that never stops points at orchestration. The symptom is the map.

Tip

Observability is the emerging sixth concern

You can't fix a component you can't see. Production runtimes now emit OpenTelemetry-style spans for every LLM call, tool call, retrieval, and handoff. Tools like LangSmith, Langfuse, and Arize Phoenix turn the five-part anatomy into a trace you can actually read.

Try it: Label the anatomy

Take one agent you can inspect — your own toy loop, a LangGraph tutorial, or the OpenAI Agents SDK quickstart — and produce a one-page dissection.

Identify all five components in the code or config: where is the model called, where do instructions live, how are tools defined, what stores memory, and where is the loop?
Draw the request flow as five numbered steps for a single example task, naming which component owns each step.
Predict one failure per component: write the specific bug you'd expect from the model, the instructions, a tool, memory, and orchestration — then say how you'd detect it (which span or log line).
Bonus — prove separation: replace the real model call with a hard-coded fake (return a fixed tool call, then a fixed answer) and confirm the loop still runs end-to-end with no API calls. If it can't, your components aren't cleanly separated yet.

Key takeaways

1An agent is a system of five parts — model, instructions, tools, memory, orchestration — not just an LLM.
2A request flows in order: instructions enter context → model reasons → tool runs → result appends to memory → loop, until a final answer or a budget guard stops it.
3Clean separation of components makes failures isolatable and lets you mock any one part to test the others.
4Every framework — LangGraph, CrewAI, OpenAI Agents SDK, Google ADK, AG2 — is just these five parts under different names.
5Each component fails in its own way, so naming the broken part is the fastest path to a fix.

Quiz

Lock in what you learned

Check your understanding

0 / 4 answered

1.Which statement best captures the core idea of agent anatomy?

2.In a single agent turn, what is the correct order of events?

3.Why does cleanly separating the components make an agent easier to debug?

4.An agent runs forever and never returns a final answer. Which component is the most likely culprit?

Go deeper

Hand-picked sources to keep learning

Anthropic — Building Effective Agents

The canonical source on workflows vs agents, component design, and why tool (ACI) design matters as much as prompts.

The Anatomy of an Agent Loop — Steve Kinney

A concise, code-forward walkthrough of the minimal loop and how model, tools, history, and orchestration fit together.

AI Agent Architecture Patterns — Redis Engineering

Component breakdown with request-flow diagrams and a per-component failure-modes table.

LLM Agent Architectures in 2026 — FutureAGI

Current overview of model options, the three-layer memory taxonomy, the tool/MCP layer, and framework-to-anatomy mapping.

Definitive Guide to Agentic Frameworks in 2026 — SoftmaxData

Side-by-side mapping of how LangGraph, CrewAI, OpenAI Agents SDK, AG2, and Google ADK implement the five components.

AI Agent Systems: Architectures, Applications, and Evaluation (arXiv 2601.01743)

Academic survey formalizing the agent tuple (policy, memory, tools, verifiers, environment) and cataloging failure modes per component.