The Agent Loop: A First Look

How a simple loop turns a text predictor into an actor

Beginner 11 minBuilderDecision-maker
What you'll be able to do
  • Trace a single pass through the perceive → reason → act → observe loop and explain what happens at each stage
  • Explain why the loop — not the model — is what creates agentic behavior
  • Read and reason about a minimal agent loop in under 10 lines of pseudocode
  • Identify the correct stopping signal (stop_reason / absence of a tool call) and the secondary safeguards that prevent runaway loops
  • Connect the loop to the ReAct pattern and preview how tools, memory, and planning extend it
At a glance

A language model, on its own, does exactly one thing: read text and predict the next chunk of text. The agent loop is the small, almost embarrassingly simple wrapper that turns that one-shot predictor into something that acts — by running it again and again, feeding it the results of its own actions until the job is done. This lesson shows you the loop, why it (not the model) is the source of agency, and the stopping conditions that keep it from running forever.

  1. 1One prediction is not an agent — a loop is
  2. 2The four stages of one turn
  3. 3How the loop knows to continue (or stop)
  4. 4The whole thing in under 10 lines
  5. 5A worked example: one tool, three turns
  6. 6Why the loop, not the model, is the agent
  7. 7Stopping conditions and runaway loops

One prediction is not an agent — a loop is

Here is the simplest way to see it. A raw language model is like a brilliant analyst locked in a room with no doors. You slide a question under the gap, they write an answer, slide it back, and that is the end of the interaction: one input in, one block of text out, no way to do anything. Useful, but inert.

Now give the analyst a phone and an assistant standing outside. The analyst can write "call this number, get me the balance, and bring it back." The assistant runs the errand, slides the result back under the door, and the analyst keeps working with that new information. Repeat until the analyst writes "done, here's the answer."

That assistant-and-loop is the agent loop. The model still only predicts text — but now some of that text is a request to act, an outside harness (your code) performs the action, and the result is handed back so the model can decide what to do next. The whole of agentic AI is this one cycle:

Perceive the current state → Reason about what to do → Act by calling a tool → Observe the result → repeat.

No loop, no agent. The model is the brain; the loop is the body that lets the brain affect the world and learn what happened.

The four stages of one turn

Each pass through the loop has four stages. In plain terms: the model gets caught up on what's happened, decides one next move, your code carries that move out, and the result is written back so the model can react. Walk through them once and the whole pattern clicks.

  1. Perceive. The model receives everything it needs to decide: the user's goal, the available tools, and the full history so far — including the results of every previous action. Concretely, this is just the context window being assembled before the call.
  2. Reason. The model predicts its next move. Crucially, it chooses between two kinds of output: a tool call ("run the calculator on 12 * 47") or a final answer ("the total is 564").
  3. Act. If the model asked for a tool, the harness — your code, not the model — actually executes it: it runs the function, hits the API, queries the database.
  4. Observe. The harness appends the tool's result back into the conversation history, then loops to step 1 so the model can see what happened.

The magic is in the handoff between Act and Observe. The model never runs anything itself; it only ever emits text. Your harness is the muscle that turns a requested action into a real one and reports back. That separation — model decides, harness executes — is the foundation every framework builds on.

How the loop knows to continue (or stop)

The loop needs a precise rule for when to keep going and when to quit. The intuition is simple: the model keeps the loop alive by asking for more help, and ends it by simply answering. Beginners often assume it stops when the model writes something like "I'm finished now." That is an anti-pattern — parsing natural language for a completion phrase is unreliable and breaks constantly.

The real signal is structural: does the model's response contain a tool call?

  • If the response requests a tool, the task isn't done — execute it, append the result, and loop again.
  • If the response is a plain final answer with no tool call, the model is signaling it's done — return that answer and stop.

Modern APIs expose this as an explicit field so you never have to guess. Each model call comes back with a small status code telling you why the model stopped talking:

ProviderContinue the loopStop the loop
Anthropicstop_reason == "tool_use"stop_reason == "end_turn"
OpenAIfinish_reason == "tool_calls"finish_reason == "stop"

Your loop reads that one field, not the prose. The model effectively steers itself: it keeps the loop alive by asking for more tools and ends it by simply answering.

Watch out

Never parse prose for 'done'

Detecting completion by searching the model's text for phrases like "task complete" or "final answer" is fragile and will fail in production. Always branch on the structured stop_reason / finish_reason field (or the structural presence of a tool call). It's reliable; prose is not.

The whole thing in under 10 lines

The agent loop is famous for how little code it takes — once you've seen the four stages, the code is just those stages written down. Strip away the niceties and the entire control flow fits in a single while:

python
def run_agent(user_goal, tools):
    messages = [{"role": "user", "content": user_goal}]
    while True:
        response = model.call(messages, tools=tools)   # Reason
        messages.append(response.message)              # remember the model's turn
        if response.stop_reason != "tool_use":          # no tool call -> done
            return response.text                        # final answer
        for call in response.tool_calls:                # Act
            result = tools[call.name](**call.args)      # run the real function
            messages.append({                           # Observe
                "role": "tool",
                "tool_call_id": call.id,
                "content": str(result),
            })

That's it. Send the conversation to the model; if it asked for a tool, run the tool and append the result; otherwise return the answer. Everything you'll learn later — planning, memory, multi-agent systems — is an elaboration on this skeleton, not a replacement for it.

Notice there's no separate memory store. The loop's short-term memory is implicit: every tool result is appended to messages, so on the next pass the model sees its entire history within the context window. For the basic loop, that's all the memory you need.

A worked example: one tool, three turns

Theory clicks fastest on a concrete run. Give the agent a single tool, calculator, and the goal "What is 18% of last month's revenue of $94,500?" Watch the loop run.

Turn 1 — Reason → Act. The model can't do reliable arithmetic in its head, so instead of guessing it emits a tool call:

text
stop_reason: tool_use
tool: calculator(expression="94500 * 0.18")

The harness runs it. Observe: it appends 17010 to the history and loops.

Turn 2 — Reason. Now the model perceives the original question and the result 17010 in its context. It has what it needs, so it returns a plain answer with no tool call:

text
stop_reason: end_turn
text: "18% of $94,500 is $17,010."

No tool call means the loop stops and returns that sentence. Two model calls, one tool execution, done.

This is the "aha": the model decided for itself that it needed the calculator, used the result, and decided for itself that the task was complete. Nobody scripted "call the calculator on step 1." That self-directed control flow — choosing actions and choosing when to stop — is exactly what we mean by agency.

Why the loop, not the model, is the agent

It's tempting to say "the model is the agent." It isn't. Think of the model as a calculator key: press it with the same input and you get the same output, every time. It is a stateless function — text in, text out — with nothing autonomous about a single prediction.

What makes the system agentic is that the loop delegates control flow to the model. In a traditional program, you — the engineer — hardcode the sequence of steps. In an agent, you hand that decision to the model at every iteration: should I act again, or am I done? The loop is what makes that delegation possible by re-invoking the model with fresh observations and respecting its decision to continue or stop.

This is why the same minimal loop powers every serious framework in 2026 — LangGraph wraps it in a stateful graph, the OpenAI Agents SDK adds handoffs between agents, AWS's Strands SDK adds cancellation and guardrails, Claude Code runs a single master loop at scale — but the while-loop backbone underneath is identical. Learn the loop and you've learned the load-bearing wall of the entire field.

Key insight

The loop is the agent

Take away the model and you have an empty harness. Take away the loop and you have a chatbot that answers once and stops. Agency lives in the loop — the part that observes results, re-invokes the model, and lets it decide when the work is finished.

Stopping conditions and runaway loops

A healthy loop stops on its own; an unhealthy one has to be caught. The primary stopping condition is the one you've already met: the model returns a final answer with no tool call (end_turn / stop). That's the loop working as designed.

But while True is dangerous in the wild. A model can get stuck — calling the same failing tool over and over, or chasing a goal it can't reach. Real incidents exist of agents running hundreds of iterations, burning real money per minute, making no progress. So production loops add secondary safeguards as safety nets, never as the main control:

  • Max-iteration cap (commonly 50–100): hard stop after N turns.
  • Wall-clock and cost limits: stop after T seconds or D dollars.
  • Stall detection: if the same tool is called 3× with no progress, break and escalate.

These are guardrails, not the steering wheel. The model's stop_reason is how a healthy loop ends; the caps are how an unhealthy one gets caught.

That's the whole pattern. From here the course expands each piece: how tools are defined and called, how memory grows beyond one context window, and how planning lets agents tackle long-horizon goals. But every one of those is a refinement of the perceive–reason–act–observe loop you now understand.

Try it: Trace the loop by hand

Pick a goal that needs exactly two tools, e.g. "Find the population of France and tell me 0.5% of it." with a web_search tool and a calculator tool. On paper (no code yet), write out every turn of the loop:

  1. For each turn, note what the model perceives (the history so far), what it reasons (which tool it calls, or the final answer), what the harness acts on, and what gets observed back.
  2. Mark the stop_reason for each turn: tool_use (continue) or end_turn (stop).
  3. Confirm the loop terminates because the model returns a final answer with no tool call — not because it wrote the word 'done'.

You should end up with 3 turns: search, calculate, answer. Then add one sentence on what a max-iteration cap of 5 would protect you from. This pen-and-paper trace builds the single most useful instinct for the rest of the course: seeing the loop run in your head before you write a line of code.

Key takeaways

  1. 1The agent loop is the perceive → reason → act → observe cycle that turns a one-shot text predictor into a system that takes actions and adapts.
  2. 2A tool call is the continuation signal: if the model requests a tool the loop runs it and repeats; a plain answer with no tool call ends the loop.
  3. 3Loop termination must be driven by the API's structured stop_reason / finish_reason field, never by parsing the model's prose for a 'done' phrase.
  4. 4Agency comes from the loop, not the model — the loop delegates control-flow decisions (act again, or stop) to the model at every iteration.
  5. 5Production loops keep the model's stop signal as primary control but add secondary safeguards — max iterations, time/cost limits, stall detection — to catch runaway behavior.

Quiz

Lock in what you learned

Check your understanding

0 / 4 answered

1.What are the four stages of one pass through the agent loop, in order?

2.In a correctly built agent loop, what tells the harness whether to keep looping or stop?

3.Why is it accurate to say 'the loop, not the model, is the agent'?

4.Which is the PRIMARY stopping condition for a healthy agent loop, and which are secondary safeguards?

Go deeper

Hand-picked sources to keep learning