Planning & Task Decomposition
Turning a big goal into doable steps
- Decompose a complex goal into subgoals using sequential, parallel, and as-needed strategies
- Compare reactive (ReAct) loops with deliberative plan-and-execute and pick the right one per task
- Model a task as a dependency graph (DAG) and explain why graphs enable parallelism and partial recovery
- Implement dynamic, localized re-planning that fixes a failed step without restarting the whole task
- Judge when planning helps and when it adds latency and cost without benefit
A single language model call cannot reliably book a five-leg trip, refactor a codebase, or research fifty sources — the goal is too big to hold in one thought. Planning and decomposition are how an agent breaks a goal into ordered, doable subgoals, then executes them with the ability to re-plan when reality disagrees. This lesson contrasts reactive loops with deliberative planning, shows how modern systems model tasks as dependency graphs, and teaches you when an upfront plan earns its cost — and when it is pure overhead.
- 1Big goals don't fit in one thought
- 2Two paradigms: react vs. plan-ahead
- 3Sequential, parallel, and as-needed
- 4Plans are graphs, not just lists
- 5When a step fails: re-plan, don't restart
- 6When planning earns its cost
Big goals don't fit in one thought
Think about how you'd tackle "plan a three-city research trip and book it under $2,000." You don't do it in one move — you pick cities, check flight prices, compare hotels, sequence the route, then reserve. An agent faces the same wall: there is no single action that satisfies the whole goal, so the goal has to be split. That splitting is task decomposition — turning one large, vague goal into a set of smaller subgoals, each small enough that the agent can actually execute it.
The mental model is a manager and a to-do list. The manager (the planning step) doesn't do the work; it figures out what work exists and in what order. Each item on the list is concrete enough that a worker — a tool call, a sub-agent, or a single focused model turn — can finish it.
Decomposition matters for two reasons. First, reliability: models reason far better over a sequence of small, well-scoped steps than over one sprawling instruction. Second, structure: once a goal is a list of subgoals, you can track progress, parallelize independent work, retry a single failed step, and inspect the agent's strategy before it spends money acting. Without decomposition, a complex task is an opaque, all-or-nothing gamble.
Two paradigms: react vs. plan-ahead
There are two fundamentally different ways an agent can sequence its work, and the difference is basically improvise-as-you-go versus map-it-out-first.
Reactive (ReAct-style) decides one step at a time. The loop is Thought → Action → Observation, repeated: the model reasons, takes a single action, sees the result, and only then decides the next step. It never commits to a full plan — it improvises with constant feedback, like a driver taking each turn as the road reveals it.
Deliberative (plan-and-execute) writes the whole plan first. One LLM call produces an ordered list of steps; then an executor — often a smaller, cheaper model or plain tool runner — works through them, calling back to the planner only when needed. This is like printing turn-by-turn directions before leaving the house.
| Reactive (ReAct) | Deliberative (Plan-and-Execute) | |
|---|---|---|
| Decides | One step at a time | Whole plan upfront |
| Adapts to surprises | Immediately | Needs explicit re-planning |
| LLM calls | One per action (expensive on long tasks) | One plan + cheap execution |
| Global view | None — no overall map | Yes — inspectable strategy |
| Best for | Short, unpredictable tasks | Long, structured, multi-tool tasks |
Neither wins universally. ReAct is robust and cheap on short tasks but bleeds tokens over long chains and has no map of where it's going. Plan-and-execute is cheaper at scale and inspectable, but a plan written before the first observation can be wrong from step one.
Key insight
They aren't mutually exclusive
The mainstream 2025–2026 approach is hybrid: high-level deliberative planning for the overall route, plus a small reactive loop inside each subtask to handle local surprises. You get the global map of plan-and-execute and the adaptability of ReAct at the same time.
Sequential, parallel, and as-needed
Once you decide to split a goal, the next question is how the pieces relate to each other — and that relationship decides your architecture. Decomposition is not one technique; subgoals fit three patterns, and recognizing which you have changes how you run them.
- Sequential — each subgoal depends on the previous one's output. Book the flight, then book the hotel near the arrival airport, then schedule meetings around the flight times. You must run these in order.
- Parallel — subgoals are independent and can run simultaneously. Get the weather for three cities is three calls that don't wait on each other. Running them in parallel is a major latency win.
- Asynchronous fan-out / fan-in — independent branches launch together but a later phase must wait for all of them. Research five competitors in parallel, then write one comparison once every branch returns.
A fourth idea governs when you decompose at all. The intuition: don't pre-chop a task you might be able to do in one bite. ADaPT (As-Needed Decomposition and Planning, NAACL 2024) showed exactly this — the agent tries a subtask directly and only decomposes it recursively when it fails to execute. This demand-driven approach beat static plan-and-execute baselines substantially on ALFWorld, WebShop, and TextCraft — because it spends decomposition effort only where the task is actually hard.
Watch out
Match granularity to the executor
Over-decomposition (trivial micro-steps like "open the browser," "focus the search box") creates coordination overhead with no benefit. Under-decomposition leaves the executor a vague instruction it can't act on. The right grain is the smallest step your executor can reliably complete in one shot — no finer.
Plans are graphs, not just lists
A flat numbered list tells you the steps but hides which ones actually depend on each other — so you end up running everything in a slow single-file line. Modern planners fix this by modeling a task as a directed acyclic graph (DAG): nodes are subtasks, edges are dependencies, and "acyclic" just means the arrows never loop back on themselves. Sequential subtasks form a chain; independent ones fan out as parallel branches. The graph makes three things possible that a list can't: scheduling independent nodes in parallel, re-running only a failed subtree instead of the whole task, and reasoning explicitly about what blocks what.
LLMCompiler (arXiv 2312.04511) made this concrete. A planner emits a dependency graph where steps reference prior outputs with symbols like $1, $2; a Task Fetching Unit dispatches every ready (unblocked) task in parallel; executors run them. The result was up to a 3.7× speedup over sequential plan-and-execute, purely by exploiting parallelism the DAG exposed.
# A plan as a DAG. Each task lists the tasks it depends on.
plan = {
"t1": {"action": "flight_price", "args": {"city": "Tokyo"}, "deps": []},
"t2": {"action": "flight_price", "args": {"city": "Seoul"}, "deps": []},
"t3": {"action": "flight_price", "args": {"city": "Taipei"}, "deps": []},
# t5 fans in: it needs all three prices before it can compare.
"t5": {"action": "pick_cheapest", "args": {"of": ["$t1", "$t2", "$t3"]}, "deps": ["t1", "t2", "t3"]},
}
def ready(tasks, done):
return [tid for tid, t in tasks.items()
if tid not in done and all(d in done for d in t["deps"])]
# t1, t2, t3 are ready immediately and run in parallel; t5 waits for the fan-in.Frameworks like LangGraph turn this idea into production infrastructure: a stateful graph of nodes and edges with checkpoints, human-in-the-loop interrupts, and dedicated re-planning nodes.
When a step fails: re-plan, don't restart
Any plan written before the agent takes its first action is really just a guess about how the world will behave — and the world doesn't always cooperate. A flight sells out, a page won't load, a tool returns garbage. Dynamic (closed-loop) re-planning is the agent watching its own execution and revising the plan the moment an observation contradicts it.
The critical design choice is how much to re-plan. The PlanGenLLMs survey (Feb 2025) splits closed-loop planning into two modes:
- Implicit / localized — fix only the failed step (or its subtree), keeping every completed node intact. Cheap, and the default best practice.
- Explicit / full — regenerate the entire plan from scratch. Prevents error accumulation when a failure invalidates everything downstream, but burns far more tokens.
A naïve agent that restarts from the beginning on every failure wastes all prior work. Confining re-planning to the failed sub-task node — localized re-planning — has been shown to cut token consumption dramatically (one 2025 task-graph paper reports up to 82%). The graph structure is what makes this possible: completed nodes stay done; only the broken branch is regenerated.
Watch out
Guard against failure loops
Re-planning that never makes progress — fix, fail, re-plan, fail again — is a classic agent death spiral. Defend with a max re-plan count, immutable plan versions so you can diff and detect non-progress, and explicit backtracking rules that abandon a dead branch instead of retrying it forever.
When planning earns its cost
Here's the catch nobody mentions in the demos: planning is not free. Before the agent does anything useful, an upfront planning phase adds an extra LLM call, latency while you wait for the plan, and the risk that the plan is stale the moment it's written. So treat planning as a cost you have to justify, not a default. That overhead pays off only for complex, multi-step, multi-tool tasks with a fairly predictable structure — the kind where a global map genuinely helps.
For short, simple, or highly unpredictable tasks, skip it. A one-or-two-tool query ("what's the weather, then convert to Fahrenheit") is faster and cheaper with a plain ReAct loop, or even direct prompting. Anthropic's Building Effective Agents makes this the headline rule: start simple, and add planning layers only when simpler solutions demonstrably fall short. Their orchestrator-worker pattern — a central LLM decomposes a task, delegates subtasks to workers, and synthesizes the results — is exactly the deliberative pattern, recommended only when the task warrants it.
A practical decision rule:
- ≤ 2 tools, predictable path → direct call or ReAct.
- Many steps, clear dependencies, parallelism available → plan-and-execute over a DAG.
- Long-horizon and messy → hybrid: plan the skeleton, react within each subtask, re-plan locally on failure.
When in doubt, reach for the least planning that reliably solves the task.
Try it: Build a plan-and-execute trip planner with localized re-planning
Write a small Python agent that plans a 3-city trip and survives a failure without restarting.
- Plan. Prompt an LLM to emit a JSON plan as a list of tasks, each with an
id, anaction,args, and adepslist (a DAG). Include at least one fan-out (three independentflight_pricelookups) and one fan-in (pick_cheapest). - Schedule. Write a
ready(tasks, done)function that returns every task whose dependencies are all complete, and run those tasks — execute the independent ones concurrently (e.g.,asyncio.gather). - Inject a failure. Make one
flight_pricetool raise on its first call. Implement localized re-planning: re-plan ONLY the failed node (retry or substitute an alternate city), keep every completed node's result, and continue. Do NOT restart the whole plan. - Guard the loop. Add a
max_replanscap so a persistently failing node can't spin forever; on exceeding it, backtrack — drop that branch and proceed with what you have. - Reflect (2–3 sentences). How many LLM calls did localized re-planning save versus a full restart? For this task, would a plain ReAct loop have been simpler? Why or why not?
Key takeaways
- 1Decomposition turns one opaque, all-or-nothing goal into small, executable subgoals you can track, parallelize, retry, and inspect.
- 2Reactive (ReAct) loops adapt one step at a time; deliberative plan-and-execute writes the whole plan upfront — hybrids combine both and dominate in practice.
- 3Modeling a plan as a dependency DAG unlocks parallel execution and partial recovery; LLMCompiler showed up to a 3.7× speedup over sequential execution.
- 4On failure, re-plan locally — regenerate only the failed subtree — instead of restarting, and guard against failure loops with re-plan caps and backtracking.
- 5Planning only pays off for complex, multi-step, structured tasks; for short or unpredictable ones it just adds latency and token cost.
Quiz
Lock in what you learned
Check your understanding
0 / 4 answered
1.What is the defining difference between reactive (ReAct) and deliberative (plan-and-execute) agents?
2.Why do modern planners model tasks as directed acyclic graphs (DAGs) instead of flat lists?
3.A plan step fails midway through a long task. What is the recommended best practice as of 2024–2025?
4.When does adding an upfront planning phase typically NOT pay off?
Go deeper
Hand-picked sources to keep learning
Canonical guidance on orchestrator-worker decomposition, prompt chaining, and the start-simple rule for adding planning.
Recursive, demand-driven decomposition — decompose a subtask only when the model fails to execute it directly.
DAG-based task planning with a Task Fetching Unit for parallel execution; up to 3.7× speedup over sequential plan-and-execute.
Comprehensive 2025 survey of open-loop vs closed-loop (implicit/explicit) planning, decomposition modes, and evaluation.
Practical walkthrough of the planner / executor / replanner architecture, DAG advantages, and LangGraph patterns.
Taxonomy of agent planning approaches: decomposition, multi-plan selection, external modules, reflection, and memory.