Reasoning & the ReAct Pattern

Interleaving thought and action

Intermediate 14 minBuilder

What you'll be able to do

Explain the ReAct loop — Thought → Action → Observation → repeat — and why grounding in real observations reduces hallucination
Read and write a ReAct transcript and map each part onto a model API's structured tool calls
Show how OpenAI tool_calls, Anthropic tool_use, and LangGraph's create_react_agent implement ReAct natively
Choose between ReAct and plan-and-execute based on a task's adaptability and cost needs
Identify ReAct's known limitations and how reasoning models change where the 'Thought' lives

At a glance

ReAct is the pattern that turns a text predictor into an agent: the model interleaves a reasoning trace (Thought) with a tool call (Action) and the result it gets back (Observation), looping until it has an answer. This lesson shows you the transcript format, the intuition behind why interleaving thought and action beats both pure reasoning and pure acting, and how every modern function-calling API quietly implements ReAct under the hood.

1Reason and act, together
2The Thought → Action → Observation cycle
3What a ReAct transcript looks like
4How function-calling implements ReAct under the hood
5ReAct vs. plan-and-execute
6Limits, and where the 'Thought' lives now

Reason and act, together

Give a model a hard question and you can do one of two things. You can let it reason — "think step by step" — but pure reasoning runs on whatever is in its head, so it confidently invents facts. Or you can let it act — call a search tool, run code — but pure action with no reasoning is blind: it can't plan, can't recover from a surprise, and you can't see why it did anything.

ReAct's insight is that you should not choose. You interleave them. The model thinks a little, takes one action, looks at what actually came back, then thinks again with that new fact in hand. Reasoning decides the next action; the observation grounds the next thought.

That single move — closing the loop between thinking and the real world on every step — is what makes an agent an agent. The reasoning keeps the actions purposeful and recoverable. The observations keep the reasoning honest, because each thought is now built on a real result rather than a guess. ReAct (Reasoning + Acting), introduced by Yao et al. at ICLR 2023, is the canonical name for this loop, and it is the backbone of essentially every tool-using agent you will build.

Key insight

Why interleaving wins

Pure chain-of-thought can hallucinate because nothing checks it. Pure acting can't plan or explain itself. ReAct = reasoning that chooses actions + observations that correct reasoning. The two halves cover each other's weaknesses, step by step.

The Thought → Action → Observation cycle

The mechanics are simple. The agent repeats three moves until it can answer:

Thought — a free-form reasoning trace: what do I know, what's missing, what should I do next?
Action — an invocation of a tool or the environment, with arguments (Search["..."], calculator(2+2), a database query).
Observation — the result the tool or environment returns, fed straight back into the next Thought.

The loop runs Thought → Action → Observation → Thought → ... until the model decides it's done and emits a Final Answer. A controller (your code, or a framework) drives it: it sends the running transcript to the model, executes whatever action the model emits, appends the observation, and asks the model again.

The original ReAct paper grounded reasoning in a Wikipedia API and beat strong baselines exactly where grounding matters: on interactive decision-making it improved over imitation- and RL-trained baselines by 34% absolute on ALFWorld and 10% on WebShop using only 1–2 in-context examples, and it cut hallucination on knowledge tasks (HotpotQA, FEVER) by anchoring each step in a real lookup instead of the model's memory.

Note

Stopping the loop

Every ReAct agent needs a termination condition: the model emits a Final Answer, or the controller hits a max-iteration cap. Always set the cap — an ungrounded thought can send the loop in circles, and you don't want an unbounded bill.

What a ReAct transcript looks like

The pattern is easiest to grasp as a transcript. Here a multi-hop question forces the agent to gather a fact, then use it:

text

Question: Who is older, the director of "Oppenheimer" or the director of "Dune"?

Thought: I need the directors of both films, then their birth years.
Action: Search["director of Oppenheimer (2023 film)"]
Observation: Christopher Nolan directed Oppenheimer (2023).

Thought: Now I need the director of Dune.
Action: Search["director of Dune (2021 film)"]
Observation: Denis Villeneuve directed Dune (2021).

Thought: I need both birth years.
Action: Search["Christopher Nolan birth year; Denis Villeneuve birth year"]
Observation: Nolan born 1970; Villeneuve born 1967.

Thought: Villeneuve (1967) is older than Nolan (1970).
Final Answer: Denis Villeneuve, the director of Dune, is older.

Notice three things. Each Thought narrates the plan, so the transcript is self-documenting. Each Action is concrete and parameterized. Each Observation is a real fact that the next Thought reasons over — the agent never has to recall Nolan's birth year from memory, so it can't hallucinate it. This visible, step-by-step trace is why ReAct agents are far more debuggable and auditable than black-box approaches: every reasoning step and tool call is right there to inspect.

How function-calling implements ReAct under the hood

Here's the punchline first: every time you use a tool-calling API, you are already running ReAct — you just don't see the labels. The original paper parsed actions out of text like Search["query"], which was brittle. Modern agents skip the parsing and use function calling instead, and that is simply ReAct with a reliable wire format. The mapping is exact:

ReAct	Modern API
Thought	the model's internal / chain-of-thought reasoning
Action	a structured `tool_call` (OpenAI) / `tool_use` block (Anthropic) — JSON with a name and typed args
Observation	the `tool`/`tool_result` message you append with the result

The orchestration loop is the same one the paper describes (shown here using the OpenAI Python SDK idiom):

python

import json

messages = [{"role": "user", "content": question}]
for _ in range(MAX_STEPS):
    resp = client.chat.completions.create(
        model="gpt-4o", messages=messages, tools=tools
    )
    msg = resp.choices[0].message
    if not msg.tool_calls:
        break  # no Action -> this is the Final Answer
    messages.append(msg)                         # the model's Thought + Action
    for call in msg.tool_calls:                  # execute each Action
        args = json.loads(call.function.arguments)
        result = TOOLS[call.function.name](**args)
        messages.append({"role": "tool",         # the Observation
                         "tool_call_id": call.id,
                         "content": str(result)})
final = msg.content

LangGraph's create_react_agent (from langgraph.prebuilt) is this loop, productionized: call model → if tool_calls present, run the tools and append results, repeat; if none, return the response. Same Thought/Action/Observation semantics — now with structured JSON, type safety, and tracing.

Watch out

Use structured tool-calling, not legacy text-parsing

LangChain's old ReActChain / ReActDocstoreAgent parsed Action: strings out of text — brittle and easy to break. The 2025+ practice is ReAct with structured tool-calling: use the function-calling API for actions, keep the same loop. Use create_react_agent from langgraph.prebuilt, not the legacy text-parsing agents.

ReAct vs. plan-and-execute

Think of two ways to take a road trip. ReAct is driving with live traffic: decide each turn only after seeing what's ahead — it's reactive. The main alternative, plan-and-execute, is mapping the whole route before you leave: a planner LLM writes the full sequence of steps up front, then an executor runs them one by one (often with a smaller, cheaper model).

The trade-off is adaptability versus predictability:

	ReAct	Plan-and-execute
Decides steps	one at a time, from observations	all up front
Adapts mid-task	naturally	needs an expensive re-plan
Cost / model	every step hits the big model	cheap executor for steps
Accuracy	adapts to surprises; can loop	higher on predictable tasks; degrades when plan is wrong
Reviewable plan	no — emerges as it goes	yes — a human can approve it first

Use ReAct when intermediate results change what's needed next — research, debugging, anything in a messy or partially observable environment. Use plan-and-execute when the whole step sequence is knowable in advance and you want cost predictability or a human checkpoint before anything runs. Neither is strictly better: if the environment shifts mid-task, plan-and-execute pays for a costly re-planning call, while ReAct simply adjusts on its next thought.

Limits, and where the 'Thought' lives now

ReAct is foundational, not flawless. Its core weakness: the generated thoughts aren't guaranteed to be grounded in the agent's real history or goal. In complex, partially observable tasks a thought can drift from what actually happened, and because each step builds on the last, that misalignment compounds. The 2025 ReflAct framework (arXiv:2505.15182) targets exactly this, replacing action-prediction thoughts with explicit goal-state reflection at each step.

There's also a subtler shift. With modern reasoning models — OpenAI o3/o4-mini, DeepSeek-R1, Claude extended thinking — the Thought step is handled internally by an RL-trained reasoning process before the model emits its action. You usually should not hand-write Thought: scaffolding in the prompt for these models; explicit thought prompts are often unnecessary and can hurt performance. The ReAct loop is unchanged — Action and Observation still flow through tool calls — but the Thought has moved inside the model.

Finally, remember ReAct doesn't always win. On some pure-QA tasks plain chain-of-thought-with-self-consistency was competitive. ReAct's edge is specifically tasks needing external information or action; if no tool is involved, it has nothing to ground on.

Tip

Practical default for 2026

Reach for ReAct via structured tool-calling (create_react_agent or a hand-rolled loop with the function-calling API) as your starting point. Add a planner only when steps are knowable up front and you need a reviewable plan or a cheaper executor.

Try it: Hand-trace, then build a ReAct loop

Part 1 — trace it (no code). Pick a multi-hop question that needs two lookups (e.g., "Which has more UN member states, Africa or the Americas?"). Write out the full transcript by hand: alternate Thought:, Action:, Observation: until a Final Answer:. Mark where each Thought depends on the previous Observation — that dependency is the whole point of ReAct.

Part 2 — build it (code). Implement the loop from the function-calling section against any model with tool calling (OpenAI tool_calls or Anthropic tool_use). Give it two tools: a calculator(expression) and a search(query) (a stub returning canned facts is fine). Run a question that needs both, and print the messages list after every step so you can see the Thought/Action/Observation transcript the model actually produced. Add a MAX_STEPS cap.

Stretch. Swap in LangGraph's create_react_agent with the same two tools and confirm the behavior matches your hand-rolled loop — proof that the framework is just this loop, productionized.

Key takeaways

1ReAct interleaves a reasoning Thought, a tool Action, and a real Observation in a loop until a Final Answer — reasoning chooses actions, observations correct reasoning.
2Grounding each thought in a real observation is what cuts hallucination; the visible transcript is what makes the agent debuggable and auditable.
3Modern function-calling implements ReAct directly: tool_call = Action, tool result message = Observation, internal reasoning = Thought — use structured tool-calls, not legacy text parsing.
4Plan-and-execute trades ReAct's adaptability for predictability and cheaper executors; pick it only when the full step sequence is knowable up front.
5ReAct's thoughts can drift from the real goal in complex tasks, and in reasoning models the Thought now lives inside the model — so often you shouldn't hand-write Thought prompts.

Quiz

Lock in what you learned

Check your understanding

0 / 4 answered

1.In the ReAct loop, what is an 'Observation'?

2.How do modern function-calling APIs relate to ReAct?

3.When should you prefer plan-and-execute over ReAct?

4.What is a key limitation of ReAct, and how do reasoning models change it?

Go deeper

Hand-picked sources to keep learning

ReAct: Synergizing Reasoning and Acting in Language Models (arXiv:2210.03629, ICLR 2023)

The primary source by Yao et al. — defines the Thought/Action/Observation format and the benchmark results.

ReAct Project Page (react-lm.github.io)

Official page with paper, code, and the original prompt examples. Maintained by Shunyu Yao.

ReAct Prompting — Prompt Engineering Guide (DAIR.AI)

Accessible reference on the format, examples, and how ReAct compares with chain-of-thought. Updated for 2025.

LangGraph create_react_agent (from scratch, functional API)

Shows the ReAct loop built on native tool_calls — exactly how function-calling implements ReAct under the hood.

Plan-and-Execute Agents — LangChain Blog

Canonical explanation of plan-and-execute as an alternative to ReAct, with the planner/executor split and trade-offs.

ReflAct: World-Grounded Decision Making via Goal-State Reflection (arXiv:2505.15182, 2025)

Identifies ReAct's ungrounded-thought limitation and proposes goal-state reflection — useful for current research directions.