Core Agent Concepts/Lesson 2 of 5

Reasoning & the ReAct Pattern

Interleaving thought and action

Intermediate 14 minBuilder
What you'll be able to do
  • Explain the ReAct loop — Thought → Action → Observation → repeat — and why grounding in real observations reduces hallucination
  • Read and write a ReAct transcript and map each part onto a model API's structured tool calls
  • Show how OpenAI tool_calls, Anthropic tool_use, and LangGraph's create_react_agent implement ReAct natively
  • Choose between ReAct and plan-and-execute based on a task's adaptability and cost needs
  • Identify ReAct's known limitations and how reasoning models change where the 'Thought' lives
At a glance

ReAct is the pattern that turns a text predictor into an agent: the model interleaves a reasoning trace (Thought) with a tool call (Action) and the result it gets back (Observation), looping until it has an answer. This lesson shows you the transcript format, the intuition behind why interleaving thought and action beats both pure reasoning and pure acting, and how every modern function-calling API quietly implements ReAct under the hood.

  1. 1Reason and act, together
  2. 2The Thought → Action → Observation cycle
  3. 3What a ReAct transcript looks like
  4. 4How function-calling implements ReAct under the hood
  5. 5ReAct vs. plan-and-execute
  6. 6Limits, and where the 'Thought' lives now

Reason and act, together

Give a model a hard question and you can do one of two things. You can let it reason — "think step by step" — but pure reasoning runs on whatever is in its head, so it confidently invents facts. Or you can let it act — call a search tool, run code — but pure action with no reasoning is blind: it can't plan, can't recover from a surprise, and you can't see why it did anything.

ReAct's insight is that you should not choose. You interleave them. The model thinks a little, takes one action, looks at what actually came back, then thinks again with that new fact in hand. Reasoning decides the next action; the observation grounds the next thought.

That single move — closing the loop between thinking and the real world on every step — is what makes an agent an agent. The reasoning keeps the actions purposeful and recoverable. The observations keep the reasoning honest, because each thought is now built on a real result rather than a guess. ReAct (Reasoning + Acting), introduced by Yao et al. at ICLR 2023, is the canonical name for this loop, and it is the backbone of essentially every tool-using agent you will build.

Key insight

Why interleaving wins

Pure chain-of-thought can hallucinate because nothing checks it. Pure acting can't plan or explain itself. ReAct = reasoning that chooses actions + observations that correct reasoning. The two halves cover each other's weaknesses, step by step.

The Thought → Action → Observation cycle

The mechanics are simple. The agent repeats three moves until it can answer:

  1. Thought — a free-form reasoning trace: what do I know, what's missing, what should I do next?
  2. Action — an invocation of a tool or the environment, with arguments (Search["..."], calculator(2+2), a database query).
  3. Observation — the result the tool or environment returns, fed straight back into the next Thought.

The loop runs Thought → Action → Observation → Thought → ... until the model decides it's done and emits a Final Answer. A controller (your code, or a framework) drives it: it sends the running transcript to the model, executes whatever action the model emits, appends the observation, and asks the model again.

The original ReAct paper grounded reasoning in a Wikipedia API and beat strong baselines exactly where grounding matters: on interactive decision-making it improved over imitation- and RL-trained baselines by 34% absolute on ALFWorld and 10% on WebShop using only 1–2 in-context examples, and it cut hallucination on knowledge tasks (HotpotQA, FEVER) by anchoring each step in a real lookup instead of the model's memory.

Note

Stopping the loop

Every ReAct agent needs a termination condition: the model emits a Final Answer, or the controller hits a max-iteration cap. Always set the cap — an ungrounded thought can send the loop in circles, and you don't want an unbounded bill.

What a ReAct transcript looks like

The pattern is easiest to grasp as a transcript. Here a multi-hop question forces the agent to gather a fact, then use it:

text
Question: Who is older, the director of "Oppenheimer" or the director of "Dune"?

Thought: I need the directors of both films, then their birth years.
Action: Search["director of Oppenheimer (2023 film)"]
Observation: Christopher Nolan directed Oppenheimer (2023).

Thought: Now I need the director of Dune.
Action: Search["director of Dune (2021 film)"]
Observation: Denis Villeneuve directed Dune (2021).

Thought: I need both birth years.
Action: Search["Christopher Nolan birth year; Denis Villeneuve birth year"]
Observation: Nolan born 1970; Villeneuve born 1967.

Thought: Villeneuve (1967) is older than Nolan (1970).
Final Answer: Denis Villeneuve, the director of Dune, is older.

Notice three things. Each Thought narrates the plan, so the transcript is self-documenting. Each Action is concrete and parameterized. Each Observation is a real fact that the next Thought reasons over — the agent never has to recall Nolan's birth year from memory, so it can't hallucinate it. This visible, step-by-step trace is why ReAct agents are far more debuggable and auditable than black-box approaches: every reasoning step and tool call is right there to inspect.

How function-calling implements ReAct under the hood

Here's the punchline first: every time you use a tool-calling API, you are already running ReAct — you just don't see the labels. The original paper parsed actions out of text like Search["query"], which was brittle. Modern agents skip the parsing and use function calling instead, and that is simply ReAct with a reliable wire format. The mapping is exact:

ReActModern API
Thoughtthe model's internal / chain-of-thought reasoning
Actiona structured tool_call (OpenAI) / tool_use block (Anthropic) — JSON with a name and typed args
Observationthe tool/tool_result message you append with the result

The orchestration loop is the same one the paper describes (shown here using the OpenAI Python SDK idiom):

python
import json

messages = [{"role": "user", "content": question}]
for _ in range(MAX_STEPS):
    resp = client.chat.completions.create(
        model="gpt-4o", messages=messages, tools=tools
    )
    msg = resp.choices[0].message
    if not msg.tool_calls:
        break  # no Action -> this is the Final Answer
    messages.append(msg)                         # the model's Thought + Action
    for call in msg.tool_calls:                  # execute each Action
        args = json.loads(call.function.arguments)
        result = TOOLS[call.function.name](**args)
        messages.append({"role": "tool",         # the Observation
                         "tool_call_id": call.id,
                         "content": str(result)})
final = msg.content

LangGraph's create_react_agent (from langgraph.prebuilt) is this loop, productionized: call model → if tool_calls present, run the tools and append results, repeat; if none, return the response. Same Thought/Action/Observation semantics — now with structured JSON, type safety, and tracing.

Watch out

Use structured tool-calling, not legacy text-parsing

LangChain's old ReActChain / ReActDocstoreAgent parsed Action: strings out of text — brittle and easy to break. The 2025+ practice is ReAct with structured tool-calling: use the function-calling API for actions, keep the same loop. Use create_react_agent from langgraph.prebuilt, not the legacy text-parsing agents.

ReAct vs. plan-and-execute

Think of two ways to take a road trip. ReAct is driving with live traffic: decide each turn only after seeing what's ahead — it's reactive. The main alternative, plan-and-execute, is mapping the whole route before you leave: a planner LLM writes the full sequence of steps up front, then an executor runs them one by one (often with a smaller, cheaper model).

The trade-off is adaptability versus predictability:

ReActPlan-and-execute
Decides stepsone at a time, from observationsall up front
Adapts mid-tasknaturallyneeds an expensive re-plan
Cost / modelevery step hits the big modelcheap executor for steps
Accuracyadapts to surprises; can loophigher on predictable tasks; degrades when plan is wrong
Reviewable planno — emerges as it goesyes — a human can approve it first

Use ReAct when intermediate results change what's needed next — research, debugging, anything in a messy or partially observable environment. Use plan-and-execute when the whole step sequence is knowable in advance and you want cost predictability or a human checkpoint before anything runs. Neither is strictly better: if the environment shifts mid-task, plan-and-execute pays for a costly re-planning call, while ReAct simply adjusts on its next thought.

Limits, and where the 'Thought' lives now

ReAct is foundational, not flawless. Its core weakness: the generated thoughts aren't guaranteed to be grounded in the agent's real history or goal. In complex, partially observable tasks a thought can drift from what actually happened, and because each step builds on the last, that misalignment compounds. The 2025 ReflAct framework (arXiv:2505.15182) targets exactly this, replacing action-prediction thoughts with explicit goal-state reflection at each step.

There's also a subtler shift. With modern reasoning models — OpenAI o3/o4-mini, DeepSeek-R1, Claude extended thinking — the Thought step is handled internally by an RL-trained reasoning process before the model emits its action. You usually should not hand-write Thought: scaffolding in the prompt for these models; explicit thought prompts are often unnecessary and can hurt performance. The ReAct loop is unchanged — Action and Observation still flow through tool calls — but the Thought has moved inside the model.

Finally, remember ReAct doesn't always win. On some pure-QA tasks plain chain-of-thought-with-self-consistency was competitive. ReAct's edge is specifically tasks needing external information or action; if no tool is involved, it has nothing to ground on.

Tip

Practical default for 2026

Reach for ReAct via structured tool-calling (create_react_agent or a hand-rolled loop with the function-calling API) as your starting point. Add a planner only when steps are knowable up front and you need a reviewable plan or a cheaper executor.

Try it: Hand-trace, then build a ReAct loop

Part 1 — trace it (no code). Pick a multi-hop question that needs two lookups (e.g., "Which has more UN member states, Africa or the Americas?"). Write out the full transcript by hand: alternate Thought:, Action:, Observation: until a Final Answer:. Mark where each Thought depends on the previous Observation — that dependency is the whole point of ReAct.

Part 2 — build it (code). Implement the loop from the function-calling section against any model with tool calling (OpenAI tool_calls or Anthropic tool_use). Give it two tools: a calculator(expression) and a search(query) (a stub returning canned facts is fine). Run a question that needs both, and print the messages list after every step so you can see the Thought/Action/Observation transcript the model actually produced. Add a MAX_STEPS cap.

Stretch. Swap in LangGraph's create_react_agent with the same two tools and confirm the behavior matches your hand-rolled loop — proof that the framework is just this loop, productionized.

Key takeaways

  1. 1ReAct interleaves a reasoning Thought, a tool Action, and a real Observation in a loop until a Final Answer — reasoning chooses actions, observations correct reasoning.
  2. 2Grounding each thought in a real observation is what cuts hallucination; the visible transcript is what makes the agent debuggable and auditable.
  3. 3Modern function-calling implements ReAct directly: tool_call = Action, tool result message = Observation, internal reasoning = Thought — use structured tool-calls, not legacy text parsing.
  4. 4Plan-and-execute trades ReAct's adaptability for predictability and cheaper executors; pick it only when the full step sequence is knowable up front.
  5. 5ReAct's thoughts can drift from the real goal in complex tasks, and in reasoning models the Thought now lives inside the model — so often you shouldn't hand-write Thought prompts.

Quiz

Lock in what you learned

Check your understanding

0 / 4 answered

1.In the ReAct loop, what is an 'Observation'?

2.How do modern function-calling APIs relate to ReAct?

3.When should you prefer plan-and-execute over ReAct?

4.What is a key limitation of ReAct, and how do reasoning models change it?

Go deeper

Hand-picked sources to keep learning