Context engineering: the hidden lever for agent quality
Prompt engineering peaked. Context engineering — what's in the agent's working memory, what's not — is the 2026 leverage point. Here's the practical playbook.
Prompt engineering tutorials peaked around 2023. Context engineering is the underdiscussed lever in 2026 — what's in the agent's working memory, what's not, what's at the start vs middle vs end. Same model, different context, dramatically different output quality.
This post is the practical playbook.
What context engineering is
The agent's "context" is everything in its window when it generates a response:
- System prompt (CLAUDE.md, instructions).
- Tool definitions (what tools are available).
- Loaded files (anything the agent has read).
- Tool output (results of previous tool calls).
- Conversation history (previous turns).
Each of these contributes. Together they shape the agent's behavior more than the individual prompt does.
Why it matters
Three observations from running large numbers of agent sessions:
1. The same agent with different contexts performs very differently
Give Claude (or GPT-5) a vague CLAUDE.md and a giant codebase dump → mediocre work.
Give the same model a tight CLAUDE.md, a focused 5-file context, and a clear prompt → excellent work.
The model is identical. The context isn't.
2. Context window size is a budget, not a license
A 1M+ token window doesn't mean "load everything." Models pay attention more to recent + opening tokens; mid-context content gets less weight (the "lost in the middle" effect). A 500k-token context with relevant content at positions 50-450 may underperform a 50k-token context with the same content concentrated.
3. Context cost is real
Each token is processed. Inflated context = more cost (and slower latency). Caching helps but doesn't eliminate.
The practical playbook
1. Scope discipline
Decide what files the agent needs before the agent decides.
For a focused task:
- List the files explicitly in the prompt.
- "Read only X, Y, Z. Don't read other files unless you ask."
For an exploratory task:
- Specify the directory boundary.
- "Look only under src/auth/. Don't read src/users/ or test fixtures."
2. Summary-first
For larger codebases, give the agent summaries instead of full files:
- File-level: a
notes/architecture.mdwith 200-word descriptions of each module. - Function-level: docstrings or type signatures, not full implementations (until needed).
- Cross-cutting: a
notes/conventions.mdwith the codebase's grammar.
The agent uses summaries to decide which files to actually read.
3. Tool output management
Tool outputs (especially terminal outputs) bloat context fast. Mitigation:
- After a tool call, summarize the output explicitly. "The test output shows 5 failures, all in
auth/middleware.test.ts. Discard the full output; we'll work with the summary." - The agent then has a 50-token summary instead of a 5000-token raw output.
4. Conversation pruning
Long sessions accumulate turns. Periodically:
- "Summarize what we've established so far in 200 words. Then we'll continue from that summary."
The agent produces a compact synthesis. Subsequent turns ride on the summary; the early turns can be conceptually discarded.
5. Context budget discipline
For long-running sessions, set yourself a budget:
- "I want this session's context to stay under 100k tokens before generation."
- Track via the API's response metadata.
- When approaching the budget, prune (summarize, drop old tool outputs, restart with summary).
6. Position matters
Critical content goes at the start (system prompt, CLAUDE.md) or the end (current task). Mid-context content gets less attention.
If you have a 50k-token reference document, it competes for attention with everything in the middle. Consider:
- Reference at the start (cached system content).
- Current file just before the user prompt.
- User prompt at the end.
7. Selective inclusion
The agent doesn't need everything. For a task:
- Code being changed: include.
- Tests for the code being changed: include.
- Code that calls the code being changed: include if signatures change.
- Adjacent unrelated code: don't include.
- Build configs, lockfiles: don't include unless directly relevant.
Be aggressive about exclusion.
What good context looks like
For "fix this bug in auth/login.ts":
[system: CLAUDE.md global preferences, ~50 lines]
[system: project CLAUDE.md, ~100 lines]
[system: session CLAUDE.md "scope is auth/", ~20 lines]
[file: src/auth/login.ts (the file being changed)]
[file: src/auth/login.test.ts (its tests)]
[file: src/auth/types.ts (shared types if relevant)]
[user: "There's a bug where login fails for usernames with apostrophes. Fix."]
Maybe 5k tokens total. Tight, focused. The agent's attention is concentrated.
What bad context looks like
For the same task:
[system: 500-line generic CLAUDE.md from a template]
[file: entire src/ tree (50 files, 30k tokens)]
[file: README.md, CHANGELOG.md, CONTRIBUTING.md]
[file: package.json, tsconfig.json, eslint.config.mjs]
[file: 10 unrelated test files]
[user: "Fix the login bug"]
50k+ tokens. The agent's attention is diluted across 60 files; it'll spend half its turns deciding what to read.
The same model produces noticeably worse output. Context engineering is the difference.
Strategies for specific scenarios
Scenario 1: large codebase exploration
Goal: understand a 500k-token codebase well enough to make a change.
Approach:
- Surface map first (top-level dirs, 200 words).
- Hot files (5-10 most relevant, ~3k tokens).
- Conventions doc (extracted from hot files, ~1k tokens).
- Specific change task with relevant files only.
Total budget: ~10k tokens of context for exploration; ~5k for the actual change.
Scenario 2: long-running refactor
Goal: refactor a pattern across 30 files over 2 hours.
Approach:
- Strong CLAUDE.md scoping which files are in/out.
- After every 5 file edits, summarize what's been done.
- Drop old tool outputs (test runs from earlier).
- Refresh context periodically with a "we've changed files A, B, C; pattern is settled; continue with D, E, F" reset.
Without this discipline, the session drifts into chaos by hour 1.5.
Scenario 3: documentation task
Goal: write a 2000-word post from a 5000-line research thread.
Approach:
- Summarize the thread first (extraction step).
- Use the summary as context for the post (compose step).
- Don't load the full thread into context for the compose step.
Two passes. Each pass has tight context.
Scenario 4: agent loops with iteration
Goal: have the agent iterate on its own output.
Approach:
- After each iteration, the agent reviews its previous attempt explicitly.
- Drop the attempt's reasoning trace; keep only the artifact.
- The next iteration sees: prompt + previous artifact + critique. Not the whole history.
Compounds well; otherwise context bloats fast.
Tools that help
Anthropic's prompt caching
Cache stable context (CLAUDE.md, references) at 10% input cost. See the prompt caching post for mechanics.
Claude Code's automatic context management
Claude Code summarizes when the context gets long. You don't manually prune. Trust this for short-medium sessions; intervene manually for very long ones.
Aider's /add and /drop
Aider lets you explicitly add/drop files from context. Direct control. Good if you want to manage manually.
Custom summarization scripts
For very large codebases, write a script that produces summaries on demand. The agent uses the summary; loads specific files when needed.
Common context engineering mistakes
Loading "just in case"
Adding a file because it might be relevant burns context budget. If it isn't directly needed, exclude.
Forgetting tool output bloat
A git log of 1000 commits is 50k tokens. Truncate or summarize before letting the agent process it.
Stale summaries
A summary written 6 months ago may not reflect the current code. Periodically regenerate.
Position-blind context
Putting critical instructions in the middle. Move them to the start or end.
Not cleaning between sessions
Starting a new session with the previous session's context = context pollution. Reset.
What models do (and don't) infer about context
For honesty:
- Models do weight recent and opening tokens more than middle.
- Models do sometimes miss critical mid-context content.
- Models don't automatically realize when context is too small (they'll generate plausibly without it).
- Models don't know what they don't know — they'll guess if context is missing.
The implication: you can't trust the model to ask for missing context. You have to provide it correctly upfront.
File-manager setup for context-aware workflows
mq-dir's structure helps with context engineering:
- Pane 1: source code (the actual files in scope).
- Pane 2: notes/summaries (the summaries the agent uses).
- Pane 3: cmux session (the agent).
- Pane 4: scratchpad (your synthesis of what's been established).
Visible separation between "actual code" and "summaries" reinforces the practice of curating, not dumping.
Verdict
Context engineering is the 2026 leverage point. The model is mostly fixed; the context is what you control.
The patterns:
- Scope discipline.
- Summary-first for large codebases.
- Tool output management.
- Conversation pruning.
- Context budget awareness.
- Position-aware structuring.
- Selective inclusion (be aggressive about exclusion).
Same model, different context = different quality. Spend the engineering effort on context, not on incrementally tweaking prompt phrasing.
mq-dir pairs naturally — the file manager is where you decide what's in scope, the summaries pane is where you curate, the agent operates within the bounds you set.
mq-dir is fully open source.
MIT licensed, zero telemetry. Read the source, file an issue, send a PR.
★ Star on GitHub →Frequently asked questions
References
- [1]
- [2]
Ready to try mq-dir?
A native quad-pane file manager built for AI multi-tasking on macOS. Free, MIT licensed, zero telemetry.
Related posts
Local LLMs on macOS in 2026: when they're worth the GPU
Local LLMs got dramatically better in 2025-2026. They're competitive with frontier APIs for some workflows; not all. Here's the honest picture.
Claude Code memory without polluting global config
Claude Code's memory feature is powerful but easy to misuse. The pattern that scales — what to put in global memory, what to put per-project, what to never persist.
File-context strategies for AI agents: what to feed, what to skip, what to summarize
When an AI agent has access to your whole repo, it doesn't read your whole repo. Here's how to choose what enters context, what stays out, and how that decision affects output quality.