Memory & Context Engineering/Lesson 3 of 4

Managing the Context Window

compact, clear, microcompact, and the 60% rule

Intermediate 13 minBuilder

What you'll be able to do

Explain the 200K window, what eats into it, and why ~150-170K is actually usable
Read a context breakdown with /context [all] and a cost breakdown with /usage
Choose correctly between /compact (summarize, keep instructions) and /clear (reset entirely)
Distinguish microcompaction, auto-compaction, and a manual /compact — and when each fires
Adopt the 60% rule and protect critical context by telling CLAUDE.md what to preserve

At a glance

Context is the fundamental constraint in Claude Code: a 200K-token window that degrades in quality long before it runs out. This lesson teaches the visibility tools (/context, /usage), the compression tools (/compact, /clear, microcompaction, auto-compaction), and the single best habit — compact early at ~60%, not at 95%.

1The mental model: context is a budget, and quality drops before it's full
2What's actually in the 200K — and the 2026 buffer change
3Seeing the budget: /context and /usage
4Compact vs. clear: compress or reset
5Three kinds of compaction: micro, auto, and manual
6The 60% rule: compact early, on your terms
7Protecting critical context across compaction

The mental model: context is a budget, and quality drops before it's full

Everything Claude Code knows in a session lives in one place: the context window. It is a fixed-size budget of tokens that holds the system prompt, your tool definitions, every loaded CLAUDE.md and skill, all the files Claude has read, every command it has run, and the entire back-and-forth of your conversation. When that budget is gone, something has to be dropped or compressed.

The number to anchor on is 200K tokens — roughly 1.6 million characters. But you never get all of it for your conversation. The system prompt, tool schemas, your memory files, and a safety buffer all reserve space up front, so the usable window for actual work is closer to 150K-170K tokens.

Here is the part most people miss: the real enemy is quality, not size. Performance degrades as context fills, well before you hit the hard limit. Raw command logs, the same file read five times, a long-finished debugging tangent — all of it stays in the window competing for the model's attention with the three lines that actually matter right now. A window that is 70% full of noise is worse than one that is 40% full of signal. Managing context is really about keeping the signal-to-noise ratio high, not just avoiding the ceiling.

text

200K-TOKEN WINDOW  (~1.6M chars)
├── System prompt + tool definitions   ┐
├── CLAUDE.md / memory / skills         ├─ reserved overhead
├── Safety buffer (~33K)                ┘
└── YOUR WORKING SPACE  (~150-170K usable)
      files read · commands run · the conversation
      ↑ degrades in QUALITY as it fills, not just at the limit

Key insight

Size is the ceiling; quality is the floor

You will feel context pressure as sloppier answers, forgotten earlier decisions, and re-reading of files long before /context shows you near 100%. Treat a window that is mostly stale logs and repeated reads as already 'full' for practical purposes, even if there's headroom on paper.

What's actually in the 200K — and the 2026 buffer change

The window is divided into fixed overhead plus your working space. Knowing the split tells you where your budget actually goes.

Region	What lives here	Roughly
System prompt + tools	Claude Code's instructions and tool schemas	fixed overhead
CLAUDE.md / memory / skills	Your loaded instruction and memory files	fixed overhead
Safety buffer	Reserved headroom near the limit	~33K (16.5%)
Working space	Files, command output, the conversation	~150-170K usable

A notable 2026 change: the reserved safety buffer was reduced to about 33K tokens (16.5%) of the window. The previous, larger buffer left less room for work; the smaller one gives you roughly 12K more usable tokens than the 2025 default — a free upgrade in working space without changing the 200K headline number.

Note that some configurations support a larger window (up to 1M tokens), but the 200K window is the default you should plan around. Don't assume the big window; design your workflow for 200K and treat anything larger as a bonus.

Note

The overhead is mostly yours to tune

You can't shrink the system prompt, but the CLAUDE.md / skills / memory slice is under your control. A bloated 400-line CLAUDE.md or a pile of always-loaded skills eats working space on every single turn. /context shows exactly how much each is costing — covered next.

Seeing the budget: /context and /usage

You can't manage what you can't see. Two commands give you the full picture — one for tokens, one for money.

/context [all] renders a colored grid of where your tokens are going right now: the system prompt, tool definitions, CLAUDE.md, skills, MCP servers, and the conversation itself, each as a slice of the 200K. It also surfaces optimization tips. Run it whenever a session starts feeling heavy — it's how you find the surprise 30K skill or the screenshot that quietly ate a quarter of your window. Pass all for the fully expanded breakdown.

text

/context        # quick token breakdown + tips
/context all    # fully expanded view of every contributor

/usage (aliases /cost, /stats) reports the cost and consumption side: this session's spend, your plan usage against limits, and — critically — a breakdown by skill, subagent, plugin, and MCP server. That per-source breakdown is how you discover that one chatty MCP server or one runaway subagent is responsible for most of your token (and dollar) burn.

text

/usage          # session cost + plan usage + per-source breakdown

Think of it as two dashboards: /context answers "what is filling my window right now?" and /usage answers "what is this costing me, and which component is to blame?"

Tip

Make /context a reflex

Run /context at the start of a fresh task and again when answers start drifting. The colored grid makes outliers obvious instantly — an image-heavy section, an MCP server loading huge schemas, or a CLAUDE.md that has grown past its useful size. You fix what you can see.

Compact vs. clear: compress or reset

When the window gets heavy you have two fundamentally different tools. Picking the wrong one either loses context you needed or keeps clutter you wanted gone.

/compact [instructions] summarizes the conversation. Claude replaces the long history with a compressed summary that preserves the important parts — code patterns, file states, key decisions — and keeps your CLAUDE.md, skills, and memory intact. You can steer it with an optional focus, e.g. /compact focus on the auth refactor and the failing tests, which tells Claude what to prioritize keeping. Use compact when you want to continue the same task with a lighter, sharper context.

/clear (aliases /reset, /new) resets the context entirely — an empty window, as if you just started. It does not undo file edits (your code on disk is untouched) and the prior conversation is still resumable via /resume. Use clear when you're switching to an unrelated task; carrying the old conversation's residue into new work is the classic 'kitchen-sink' anti-pattern that bloats context for no benefit.

	`/compact [focus]`	`/clear`
Effect	Summarizes history into a compressed form	Empties the context window
Keeps CLAUDE.md / skills / memory?	Yes	Reloads fresh (they re-inject)
File edits on disk	Untouched	Untouched
Prior conversation	Replaced by summary (still on disk)	Resumable via `/resume`
Use when	Continuing the same task	Starting an unrelated task

The rule of thumb: same task, getting heavy → compact. New, unrelated task → clear.

Example

Two everyday calls

Compact: You've spent an hour implementing a feature; the window is full of file reads and test output but you're not done. Run /compact focus on the remaining TODOs and the API contract to keep going lighter.

Clear: You finish the feature and now need to fix an unrelated bug in a different module. Run /clear and start clean — there's no reason for the new task to inherit the old one's history.

Watch out

Compaction can drop hard boundaries

A summary is lossy. Instructions you only said in chat — like 'don't push to main' — can be summarized away. For guarantees that must survive compaction, put them in a deny permission rule or a hook, not just a message. Compaction is for context, not for enforcing rules.

Three kinds of compaction: micro, auto, and manual

Compaction isn't one thing — three distinct mechanisms operate at different times, and understanding the differences is what separates fumbling from control.

Microcompaction clears stale tool results — old command output and file reads that are no longer relevant — without making a model call. It's cheap, automatic, and surgical: it trims the low-value clutter (the noise) while leaving your conversation and decisions fully intact. No summarization, no cost, no lost reasoning.

Auto-compaction kicks in automatically as you approach the limit. It summarizes the conversation — code patterns, file states, key decisions — into a compressed form so the session can keep going. It's the safety net that stops you from hitting a hard wall mid-task, but because it fires near 95% full and you don't control its timing, the summary is often coarser and the model is already operating in a degraded, crowded window.

Manual /compact is you choosing the moment. And the moment matters enormously.

Mechanism	Trigger	Model call?	What it does
Microcompaction	Automatic, ongoing	No	Clears stale tool results (old logs, repeated reads)
Auto-compaction	Automatic, near the limit (~95%)	Yes	Summarizes the conversation to keep going
Manual `/compact`	You run it (best at ~60%)	Yes	Summarizes on your terms, with optional focus

Key insight

Microcompaction trims noise; auto/manual compaction summarizes signal

Microcompaction is free and conservative — it only removes tool results that have gone stale, so it never costs you a model call or your reasoning. Auto- and manual compaction are the heavier move: they replace conversation history with a summary. Knowing which is which tells you why some context shrinks silently (micro) and some shrinks with a noticeable 'compacting...' step (auto/manual).

The 60% rule: compact early, on your terms

This is the single most valuable habit in the lesson. Run /compact manually at around 60% full — not at 95%.

Why early beats late:

Sharper summaries. At 60%, Claude has plenty of headroom to write a thorough, well-organized summary. At 95%, auto-compaction is summarizing under pressure in an already-degraded window — the result is coarser and loses more.
Cheaper. Summarizing a 60%-full window costs less than summarizing a 95%-full one, and you avoid the quality tax of working in a crowded window in the meantime.
You control the focus. A manual /compact focus on X lets you tell Claude exactly what to preserve. Auto-compaction makes that call for you, and it can guess wrong.
No surprise stalls. Waiting for auto-compaction means it fires mid-task at an unpredictable moment. Compacting proactively keeps you in command of when the pause happens.

The practical loop: keep /context handy, and when working space crosses roughly 60%, run /compact with a focus instruction. Treat auto-compaction as the emergency backstop you'd rather never hit — not your default behavior.

text

0% ────────── 60% ────────── 95% ───── 100%
               ▲                ▲
         compact HERE     auto-compaction
         (manual, sharp)   (backstop, coarse)

Tip

Pair compacting with clearing

The two habits work together. Within a long single task, /compact at ~60% to stay sharp. Between unrelated tasks, /clear so the next task starts clean. Compact keeps the same work lean; clear stops cross-task contamination. Most heavy users do both, constantly.

Protecting critical context across compaction

Because any summary is lossy, you sometimes need to guarantee that a specific piece of context survives a compaction. The lightweight way to do this is to tell CLAUDE.md what to keep.

Add a line to your project's CLAUDE.md in the form 'when compacting, preserve X' — for example, naming the API contract, the migration plan, or the invariant the whole task depends on. Because the project-root CLAUDE.md is re-injected from disk after a /compact, that instruction is present when Claude builds the summary, nudging it to retain what you flagged.

## Context management
- When compacting, preserve: the database migration order, the
  public API contract in api/contract.md, and any decision marked DECISION:.
- Do not summarize away open TODO items; keep them verbatim.

This is advisory, not a hard guarantee — CLAUDE.md is delivered as context, so compliance is a strong nudge rather than a lock. For things that absolutely must not be lost or violated (like 'never push to main'), reach for a deny permission rule or a hook instead, which are enforced deterministically. Use the CLAUDE.md preserve-line for content you want carried forward, and rules/hooks for boundaries that must hold.

Watch out

Advisory, not enforced

A 'when compacting, preserve X' line increases the odds X survives — it does not guarantee it. Critical, non-negotiable constraints belong in deny rules or hooks. Keep the CLAUDE.md preserve-line for important information, not for safety boundaries.

Try it: Watch the budget, then take control of it

Practice seeing and shaping the context window on a real session.

Baseline: start claude in a real project and immediately run /context. Note the slices — system prompt, tools, CLAUDE.md, skills, MCP. Record roughly how much working space you have before doing anything.
Fill it: ask Claude to read several files and run a few commands (a build, a test run, a git log). Re-run /context and watch the conversation slice grow. Notice raw command output piling up.
Find the cost: run /usage and read the per-source breakdown. Identify which skill, MCP server, or subagent (if any) is the biggest single contributor.
Compact early: once working space is around 60%, run /compact focus on the current task and any open TODOs. Re-run /context and confirm working space dropped — and that your CLAUDE.md/skills are still loaded.
Protect context: add a line to the project CLAUDE.md such as When compacting, preserve: the list of files we have changed and any DECISION: notes. Force another /compact and check whether that information survived in the summary.
Reset cleanly: when you switch to a different, unrelated task, run /clear and confirm with /context that the window is fresh — then verify your earlier conversation is still reachable via /resume.

Deliverable: a short note recording (a) your usable working space at baseline, (b) the single biggest contributor /usage surfaced, (c) the before/after working-space numbers around your 60% compact, and (d) whether your 'when compacting, preserve X' line actually survived — and one sentence on when you'd reach for a hook or deny rule instead.

Key takeaways

1The context window is 200K tokens (~1.6M chars); after system prompt, tools, memory, and a buffer, ~150-170K is usable for actual work.
2Degradation is about quality, not just size — stale logs, repeated reads, and forgotten history compete with what matters, so high signal-to-noise beats raw headroom.
3`/context [all]` shows the token breakdown; `/usage` (`/cost`, `/stats`) shows cost plus a per-skill/subagent/plugin/MCP breakdown.
4`/compact [focus]` summarizes and keeps CLAUDE.md/skills/memory; `/clear` resets entirely (file edits preserved, prior conversation resumable) — compact for the same task, clear for a new one.
5Microcompaction clears stale tool results with no model call; auto-compaction summarizes near the limit; the best practice is a manual `/compact` at ~60%, not 95%.
6The 2026 buffer dropped to ~33K (16.5%), giving ~12K more usable; protect must-keep context with a 'when compacting, preserve X' line in CLAUDE.md (advisory — use deny rules/hooks for hard boundaries).

Quiz

Lock in what you learned

Check your understanding

0 / 4 answered

1.Your session is about 70% full, mostly with long command logs and the same three files read several times, and Claude is starting to forget earlier decisions. Why is this a problem even though /context shows headroom remaining?

2.You've just finished an authentication feature and now need to fix a completely unrelated bug in the billing module. Which command best prepares your context, and what happens to your work?

3.What is the key difference between microcompaction and auto-compaction?

4.Why does the best practice say to run /compact manually at around 60% rather than letting auto-compaction handle it at ~95%?

Go deeper

Hand-picked sources to keep learning

Manage Claude's context window (official docs)

The 200K window, what fills it, /context, /compact, /clear, microcompaction, and auto-compaction.

Manage costs and usage (official docs)

How /usage reports session cost, plan usage, and the per-skill/subagent/plugin/MCP breakdown.

Claude Code compaction explained

Deep dive on how compaction works in practice and why compacting early produces better results.

Memory & CLAUDE.md (official docs)

Where to add a 'when compacting, preserve X' line and how CLAUDE.md re-injects after /compact.

Slash command reference (official docs)

Exact syntax and aliases for /context, /usage (/cost, /stats), /compact, and /clear (/reset, /new).

claude-code on GitHub

Source, issues, and changelog tracking context-window and compaction behavior across versions.