How often do agents go wrong in 2026?

Less than they used to (model quality improved), but real. For typical agentic work, expect ~5-10% of agent outputs to need revision. The protocol catches these efficiently rather than letting them slip into production.

Should I trust the agent's claim that something is fixed?

Verify before trusting. Run tests, read the actual diff, spot-check a few edge cases. The agent's 'I fixed it' is a hypothesis, not a fact.

What's the most common agent failure mode?

Confident wrongness — the agent states something incorrect with confidence. Hallucinated API methods, wrong type signatures, wrong file paths cited. Always verify any specific claim against the actual code.

Should I tell the agent it was wrong?

Yes, but specifically. 'You were wrong' produces apology + retry. 'You said X exists but it doesn't; check Y instead' produces correction + accurate retry. Be specific.

When should I give up and do it myself?

After 2-3 unsuccessful iterations on the same fix. If the agent can't get it in 3 attempts, the issue is probably outside its model — context is missing, the codebase has unusual patterns, or the bug is genuinely subtle. Doing it yourself is faster than the 4th attempt.

When the agent is wrong: a debugging protocol

AI agents go wrong in patterns. Recognizing the failure mode lets you respond efficiently — pure retry vs context fix vs scope tightening vs human takeover. Here's the structured debugging protocol.

The five failure modes

After hundreds of agent sessions, agent failures reduce to five patterns:

1. Hallucinated APIs / methods

The agent calls a method that doesn't exist, references a function that was renamed, or imports a package that isn't in package.json.

Diagnostic: error mentions "is not a function" or "module not found."

Fix: tell the agent the actual API. "There's no setupUser method; the right method is createUser from users-service.ts."

2. Wrong pattern matched

The agent looked at the codebase and pattern-matched to something similar but not right. New code is plausible but doesn't fit this case.

Diagnostic: code looks fine, but it doesn't match the actual convention.

Fix: cite the right pattern explicitly. "We use Repository pattern, not direct ORM calls. Look at users-repo.ts for the convention."

3. Scope creep

The agent did what was asked plus "helpful" adjacent changes.

Diagnostic: PR has more files than expected.

Fix: revert the extra changes; re-prompt with sharper scope. See the runaway agent post.

4. Confidently wrong explanation

The agent claims something works, claims it understands the bug, claims the fix is right — but it isn't.

Diagnostic: agent's confidence doesn't match observed reality (tests still fail; behavior unchanged).

Fix: ask for verification. "You said you fixed it. Run the tests and confirm."

5. Drift over long sessions

After several hours, the agent slowly loses track of CLAUDE.md scope, original goal, or earlier decisions.

Diagnostic: agent's later actions contradict earlier ones, or it asks questions it should know answers to.

Fix: restart the session with a fresh context summary. Don't try to course-correct in-place.

The debugging protocol

When the agent's output is wrong, run the protocol top to bottom:

Step 1 (1 min): identify the failure mode

Read the agent's last action and result. Match to one of the five modes above.

If you can't classify, treat as "drift" by default.

Step 2 (3 min): take corrective action based on mode

Mode	Response
Hallucinated API	Cite real API; re-prompt
Wrong pattern	Cite real pattern with example file path
Scope creep	Revert; re-prompt with explicit `not in scope`
Confidently wrong	Ask for verification ("run the test and confirm")
Drift	Restart session with fresh summary

Step 3 (variable): retry with the correction

Re-prompt. The corrective context should specifically address what went wrong.

Step 4 (2 min): verify

Run tests, read the diff, confirm the fix is real. Don't trust the agent's claim of success.

Step 5: if it still fails after 2-3 retries → take over

If the agent can't get it after 3 attempts:

Context is missing (you can't infer what; pause and audit).
Codebase has unusual patterns the agent's training didn't see.
The bug is genuinely subtle (a 4th attempt won't help).

Do it yourself. Faster than another retry.

Specific corrective prompts

For hallucinated APIs

You said `setupUser()` exists in users-service.ts. It doesn't.

The actual methods in users-service.ts are:
- createUser(input: CreateUserInput): User
- updateUser(id: string, patch: UpdateUserInput): User
- deleteUser(id: string): void

Use the right one. Re-read the file if you're unsure.

The specific contradiction + the actual API + the instruction to re-read.

For wrong pattern

You wrote a service that calls the database directly. Our convention is
Repository pattern — services call repos, repos do DB work.

Example: src/users/users-service.ts → users-repo.ts.

Move the DB calls into auth-repo.ts. Service should only call the repo.

Reference the right pattern + cite an existing example.

For scope creep

The change to src/users/UserCard.tsx is out of scope for this session.

Revert it. Then re-do the auth changes only.

In scope: src/auth/* + tests/auth/*.
Out of scope: everything else.

Specify the revert + re-state scope.

For confidently wrong

You said the test passes. Run `npm test -- --bail` and paste the output.

Force evidence. Don't accept the claim without verification.

For drift

Stop. We've drifted. Restart this conversation.

Context: [paste a 200-word summary of what was decided]

Continue from there.

Reset with a clean summary. Trying to course-correct in-place rarely works on a long-drifted session.

Distinguishing failure modes

Sometimes the symptom is ambiguous. Two diagnostics:

Did the agent verify?

If the agent never ran tests / never read the file it's modifying / never checked its work — that's confidence-without-verification (mode 4).

If the agent did verify but still got the wrong answer — that's hallucination or wrong pattern (mode 1 or 2).

Is the symptom local or systemic?

Local symptom (one wrong API) → mode 1. Systemic symptom (many wrong APIs, many wrong patterns) → context is broken; restart.

Anti-patterns

"Try again"

If you say only "try again," the agent has no new information. It'll likely produce a similar wrong answer.

Always provide specific corrective context.

Apologetic prompting

The agent will sometimes apologize and produce another wrong answer. Don't accept the apology as fix; verify the new output.

Marathon recovery

If you've spent 30+ minutes trying to correct one issue, you're in a sunk-cost trap. Step back, do it yourself, save the agent for the next task.

Trusting the explanation

The agent's narrative ("I think the bug is here because...") is a hypothesis, not a fact. Verify the hypothesis with code/tests.

Re-prompting the same context that failed

If the failure was caused by missing context (mode 5), re-prompting in the same session = same failure. Restart.

When the agent is consistently wrong

If the same agent + same task type fails repeatedly:

Audit the CLAUDE.md

Is the scope right? Are conventions clearly stated? Are common pitfalls listed?

Often: tightening CLAUDE.md fixes 70% of recurring failures.

Audit the prompt

Is the task clearly specified? Are file paths and pattern references included?

Often: writing a more specific prompt fixes the next 20%.

Try a different model

If you're using a smaller/cheaper model and quality is consistently below bar, try the frontier model for this task type.

Often: this fixes the remaining 10% — but check if the workflow can be redesigned to fit smaller models first.

Question the workflow itself

If a task is consistently hard for the agent, is it a good fit for AI assistance? Some tasks are genuinely better done by hand.

File-manager setup for debugging

mq-dir's panes for agent debugging:

Pane 1: the file the agent is changing (so you see edits in real-time).
Pane 2: the test file (so you can verify).
Pane 3: cmux session running the agent.
Pane 4: notes / corrective prompt drafts.

You see the change, the test, the conversation, and your draft response — all simultaneously.

What this protocol gives you

After a few weeks of consistent application:

Faster recovery from agent failures (3 minutes vs 30).
Better instinct for when to take over (don't sink-cost).
Improved prompts and CLAUDE.md (each failure teaches something).
More accurate trust calibration (you know when to verify).

The compounding effect is that your overall AI workflow quality improves, not just individual session debugging.

Verdict

AI agents fail in patterns. Recognizing the pattern saves time:

Hallucinated API → cite real API.
Wrong pattern → cite real pattern.
Scope creep → revert + re-scope.
Confidently wrong → demand verification.
Drift → restart with summary.

Verify everything the agent claims. Don't trust apologies. Take over after 2-3 failed iterations.

The protocol is small but the daily impact is real. Most heavy AI users develop something like it implicitly; documenting it makes the discipline transferable.

mq-dir's quad-pane is well-suited to debugging — diff, tests, agent terminal, notes all visible. Free, MIT.