Your First Agent From Scratch

A complete agent loop in ~40 lines, no framework

Intermediate 16 minBuilder
What you'll be able to do
  • Write a complete, runnable agent loop in Python from scratch, with no framework
  • Define tools with correct JSON Schema and wire them to Python functions
  • Parse the model's tool calls, execute them, and feed results back into message history
  • Implement robust stopping conditions: the model's own signal plus a hard max-iterations cap
  • Explain what this minimal loop lacks and why production teams reach for frameworks
At a glance

Strip away every framework and an agent is one small while-loop: call the model, run the tools it asks for, feed the results back, repeat until it stops asking. In this lesson you build that loop in about 40 lines of Python with real tool calling, then see exactly which production concerns it leaves on the table — the gaps that frameworks exist to fill.

  1. 1The entire agent is a four-step loop
  2. 2Step 1 — Define tools and their schemas
  3. 3Step 2 — Map names to functions and run them
  4. 4Step 3 — The loop, in real Python
  5. 5Step 4 — Stopping: the signal and the safety cap
  6. 6Anthropic vs. OpenAI — same loop, different envelope
  7. 7What this loop lacks — and why frameworks exist

The entire agent is a four-step loop

Before any code, hold the whole thing in your head. A from-scratch agent is a loop that repeats exactly four steps until the model is done:

  1. Call the model with the current message history and your tool definitions.
  2. Check whether the response contains tool calls.
  3. Execute every tool the model asked for, and append each result back to the history.
  4. Stop or continue: if there were no tool calls, the model's text is your final answer; otherwise, loop.

That's it. Everything else — calculators, web search, multi-agent systems, billion-dollar coding assistants — is this loop with better tools and more safety rails bolted on.

The one thing that surprises people: the API is stateless. The model does not remember the previous turn. On every single call you re-send the entire conversation — your messages, the model's tool requests, and the tool results. The history is the agent's memory. Lose track of it and the agent forgets what it just did.

Keep this mental model close. The next sections are just a faithful, line-by-line translation of these four steps into Python.

Key insight

The loop is the agent

An agent is not a special model. It's an ordinary model placed in a while loop that runs its tool calls and hands back the results. If you can write the loop, you understand every agent framework in existence.

Step 1 — Define tools and their schemas

A tool is two things: a schema that tells the model the tool exists and how to call it, and a Python function that actually runs. The schema is JSON Schema and has three parts that matter:

  • name — a short identifier (regex ^[a-zA-Z0-9_-]{1,64}$).
  • description — natural language explaining when and why to use the tool. This is the single biggest lever on tool-selection quality. Treat it as prompt engineering, not a code comment.
  • input_schema — a JSON Schema object describing the arguments.
python
import anthropic
client = anthropic.Anthropic()  # reads ANTHROPIC_API_KEY from env

tools = [{
    "name": "calculator",
    "description": "Evaluate a basic arithmetic expression like '12 * (3 + 4)'. "
                   "Use this for any exact math; never compute it yourself.",
    "input_schema": {
        "type": "object",
        "properties": {
            "expression": {"type": "string",
                           "description": "A math expression, e.g. '2 ** 10'"}
        },
        "required": ["expression"],
    },
}]

Note: Anthropic uses the key input_schema; OpenAI nests the same JSON Schema under function.parameters. Same idea, different envelope. A vague description ("does math") causes the model to pick the wrong tool or pass garbage arguments — be specific.

Watch out

Never use eval() for a real calculator

Tutorials show eval(expression) because it's one line — but it executes arbitrary code and is a serious security hole. In anything real, use a safe evaluator such as the simpleeval library, or ast.literal_eval with sanitization.

Step 2 — Map names to functions and run them

The model can't run your code; it only emits a request: call calculator with {"expression": "12 * 7"}. You execute it. The clean pattern is a dictionary that maps each tool name to a Python callable, so dispatch is a single lookup.

python
from simpleeval import simple_eval  # pip install simpleeval

def calculator(expression: str) -> str:
    return str(simple_eval(expression))

TOOL_MAP = {"calculator": calculator}

def run_tool(name: str, args: dict) -> str:
    try:
        return TOOL_MAP[name](**args)
    except Exception as e:
        # Return the error to the model instead of crashing.
        return f"Error running {name}: {e}"

The try/except matters more than it looks. Tool failures should come back to the model as a descriptive error string, not as a raised exception. If a tool throws and you let it crash, the run is over. If you hand the model "Error: division by zero", it can read that, reason about it, and try a different approach — which is exactly the adaptive behavior you wanted an agent for. Returning errors as observations is what separates a brittle script from a resilient agent.

Tip

Errors are just observations

Catch every exception in your tool executor and return it as text. The model treats a tool error like any other result and can recover from it — a missing file, a 404, a bad argument — all become recoverable instead of fatal.

Step 3 — The loop, in real Python

Now assemble the four steps. With Anthropic's API, the signal to keep looping is stop_reason == "tool_use". The model returns a content array that can mix text blocks and tool_use blocks; each tool_use block has an id, a name, and an input dict. You send results back as tool_result blocks on a user message, each referencing the original tool_use_id.

python
def run_agent(user_prompt: str, max_iterations: int = 10) -> str:
    messages = [{"role": "user", "content": user_prompt}]
    for _ in range(max_iterations):
        resp = client.messages.create(
            model="claude-sonnet-4-6",
            max_tokens=1024,
            tools=tools,
            messages=messages,
        )
        # Re-append the model's full turn (text + tool_use blocks).
        messages.append({"role": "assistant", "content": resp.content})

        if resp.stop_reason != "tool_use":
            return "".join(b.text for b in resp.content if b.type == "text")

        results = []
        for block in resp.content:
            if block.type == "tool_use":
                output = run_tool(block.name, block.input)
                results.append({
                    "type": "tool_result",
                    "tool_use_id": block.id,
                    "content": output,
                })
        messages.append({"role": "user", "content": results})
    return "Stopped: hit max_iterations without a final answer."

That is a complete agent. Notice the loop iterates over all tool_use blocks — the model can ask for several tools at once, and dropping any of them silently breaks behavior.

Watch out

Re-append the assistant turn before the results

You must add the model's tool_use turn back to messages before the tool_result. With OpenAI it's identical: re-append the assistant message (with its tool_calls) before any role:"tool" result, or the API rejects the request with a validation error.

Step 4 — Stopping: the signal and the safety cap

Stopping has two independent parts, and you need both.

The primary exit is the model's own signal. When the response has no tool calls (stop_reason != "tool_use"), the model is telling you it's finished — its text is the final answer. This is how a healthy run ends.

The backstop is a hard max_iterations cap. Models get stuck: they retry a failing tool forever, ping-pong between two tools, or chase a goal they can't reach. Without a cap, a confused agent loops until you kill it — burning tokens and money the whole time. A cap of 10–15 is reasonable for simple tasks; tune it to the work. This is not optional. Every production loop has a hard iteration limit.

python
print(run_agent("What is 17% of 2,340, then add 95?"))
# The model calls calculator twice (or once), reads the result,
# then returns a plain-text answer -> stop_reason != 'tool_use' -> done.

Think of it as two doors out of the loop: the front door (the model finishes) and the emergency exit (the cap trips). You hope to leave by the front door every time, but you never remove the emergency exit.

Tip

Build it incrementally

Test each piece alone before wiring the loop: confirm the tool function returns the right value, confirm one API call produces a tool_use block, then close the loop. Debugging a silent agent loop top-to-bottom is far harder than verifying the parts.

Anthropic vs. OpenAI — same loop, different envelope

Here's the reassuring part: the four steps are identical on every provider. Only the envelope — the exact field names you read and write — changes. Think of it like the same letter mailed in two different envelopes. Knowing both keeps you from assuming a Claude loop drops straight onto OpenAI; the logic ports, the field names don't.

Anthropic (Claude)OpenAI (Chat Completions)
Loop signalstop_reason == "tool_use"finish_reason == "tool_calls"
Where calls livecontent blocks of type tool_usemessage.tool_calls list
Argumentsblock.input (a dict)call.function.arguments (a JSON string — you json.loads it)
Schema keyinput_schemanested under function.parameters
Sending resultstool_result block on a user messagea message with role: "tool" and a matching tool_call_id

Both accept standard JSON Schema (type, properties, required, enum, per-property description). Both support parallel tool calls — multiple calls in one response — so always iterate over the full list.

For production on OpenAI, enable strict schema mode ("strict": true in the tool definition). It guarantees the model's arguments exactly conform to your schema, so you can skip defensive parsing. Anthropic does not have an equivalent explicit flag — Claude generally follows the schema closely, but you should still validate inputs. OpenAI's newer Responses API is the recommended OpenAI path going forward; Chat Completions remains fully supported and uses the structure above.

Watch out

OpenAI arguments are a string, not a dict

A common bug: OpenAI returns function.arguments as a JSON string, so you must json.loads() it before calling your function. Anthropic gives you input already parsed as a dict. Mixing these up causes a TypeError that's easy to misread.

What this loop lacks — and why frameworks exist

Your 40-line agent is real and useful. It is also deliberately bare. Run it in production and you'll quickly want things it doesn't have:

  • State persistence & resumability — if the process dies mid-run, everything is lost; there's no checkpoint to resume from.
  • Retries & back-off — a single transient API error kills the run.
  • Context management — long runs overflow the context window; nothing prunes or summarizes.
  • Observability — no tracing, no per-step cost/latency, so debugging a bad run is guesswork.
  • Human-in-the-loop gates — no way to pause and ask before a destructive action.
  • Multi-agent coordination — no handoffs, no sub-agents, no shared state.
  • Async / parallel execution — tools run one at a time, synchronously.

These exact gaps are what frameworks fill. LangGraph adds checkpointing, persistence, and human-in-the-loop interrupts via a state-machine model. CrewAI adds role-based multi-agent teams. The OpenAI Agents SDK (March 2025) adds handoffs, guardrails, and built-in tracing. Google's ADK adds hierarchical agents and the A2A protocol.

The lesson isn't "always use a framework." For one agent with one or two tools, this raw loop is simpler, cheaper, and easier to debug. Frameworks earn their complexity only when you actually need persistence, coordination, or approval gates.

Try it: Build the loop, then break it on purpose

Implement the agent from this lesson end to end. (1) Define a calculator tool (use simpleeval, not eval) and a second tool of your choice — e.g. get_current_time() or a stubbed web_search(query) that returns a fixed string. (2) Write the run_agent(prompt, max_iterations) loop with the tool-name → function dispatch dict and a try/except that returns errors as strings. (3) Test it with a prompt that needs two tool calls, like "What's 17% of 2,340, and what time is it?" — confirm the model calls both tools and returns a final text answer.

Now break it deliberately to feel the failure modes: (a) comment out the line that re-appends the assistant turn and observe the API validation error; (b) set max_iterations=1 on a multi-step task and watch it stop early; (c) make your calculator raise an exception and confirm the model recovers when you return the error as a string. Write two sentences on what each failure taught you about why frameworks exist.

Key takeaways

  1. 1An agent from scratch is a four-step while-loop: call the model, check for tool calls, execute them and append results, then stop or continue.
  2. 2The API is stateless — you must re-send the entire message history, including the assistant's tool-use turn, on every call.
  3. 3Tool descriptions are prompt engineering: a precise description is the biggest lever on whether the model picks the right tool with the right arguments.
  4. 4Stopping needs both the model's own signal (no tool calls) and a mandatory max-iterations cap to prevent infinite loops and runaway cost.
  5. 5The raw loop intentionally lacks persistence, retries, observability, and coordination — those gaps are exactly what frameworks like LangGraph and the Agents SDK fill.

Quiz

Lock in what you learned

Check your understanding

0 / 4 answered

1.In the agent loop, what is the primary signal that the agent should STOP and return its answer?

2.Why must you re-send the entire message history (including the assistant's tool-use turn) on every API call?

3.What is the recommended way to handle a tool that throws an exception during execution?

4.Which capability is NOT provided by the raw ~40-line loop and is a key reason teams adopt frameworks?

Go deeper

Hand-picked sources to keep learning