Error-Driven Development: Learning from Failures

CI is red. The stack trace points at payment.ts:212, you’ve already burned twenty minutes guessing which of the last six commits broke it, and the “fix” you just pushed turned one failing test into three. Staring harder at the error isn’t working.

The error message is the most precise specification you have of what’s wrong, and it’s exactly the input an AI assistant is best at consuming. Error-Driven Development (EDD) leans into that: instead of aiming for a perfect first draft, you run a tight error → fix → re-run loop and let each failure steer the next change. Done deliberately, it converges fast even on gnarly cascades.

What you’ll walk away with

A repeatable error → fix → re-run loop you can run in any of the three tools
Copy-paste prompts for a production stack trace, a compiler cascade, and a failing-test loop
The per-tool mechanics: who runs the tests, who rolls back a bad fix, who iterates unattended
An MCP shortcut that pulls the Sentry issue for you instead of copy-pasting stack traces
The failure modes of the loop itself, chasing the wrong error, fixing symptoms, looping on flaky tests

The loop, by tool

The cycle is the same everywhere, surface an error, hand it to the AI with enough context, apply the fix, re-run the exact thing that failed. What differs is who runs the command and how you back out a bad fix.

In agent mode, Cursor runs the tests or build itself, reads the terminal output, and iterates without you copy-pasting. The safety net is checkpoints: every agent edit is a restore point, so when a “fix” makes things worse you roll back to the last green state in one click instead of untangling it.

Best when you want to watch the loop happen and intervene the moment it goes sideways.

Claude Code runs the compiler and tests through the Bash tool and feeds the output back to itself, so the loop closes inside one session. In headless mode (claude -p) you can let it iterate against a failing command with a turn cap so it doesn’t spin forever.

Best when the loop should run from the terminal or in CI, scripted and bounded.

Codex runs commands inside a sandbox. For an interactive EDD loop, use codex --sandbox workspace-write -c approval_policy=on-request. The sandbox permits routine edits and test commands; on-request is a separate escalation policy, not a failure hook, so ask Codex in the prompt to stop and report after a bounded number of failed attempts.

Best when you want it to grind through a long error list mostly unattended but stop at genuine failures.

Scenario 1: a production bug from a stack trace

A user hit a crash, and your error tracker captured the exception. The fastest path is to give the AI the trace plus the files it implicates.

Grab the trace. Copy the full exception from Sentry, the runtime error and the stack, not just the top line.
Hand it over with the suspect files. Name the files so the AI doesn’t have to grep blind.

Copy-paste prompt for a production stack trace:

Production bug. Here is the exception and stack from Sentry:

TypeError: Cannot read properties of undefined (reading 'total') at applyDiscount (src/services/payment.ts:212:18) at checkout (src/controllers/checkout.ts:64:22)

The relevant files are @src/services/payment.ts and @src/controllers/checkout.ts. Find the root cause, the undefined value, not just where it’s read. Tell me whether the fix is a guard at line 212 or an upstream initialization bug, then make the change and add a regression test.
Apply and re-run. The fix should address where total became undefined (an upstream cart with no items, say), not just bolt on ?. at line 212. Re-run the failing path; if a new error surfaces, feed it back and repeat.

Scenario 2: a compiler cascade after a refactor

You changed the signature of a core function and the compiler lit up with thirty errors across the codebase. This is EDD’s sweet spot, the errors are an exact, machine-generated worklist.

Make the change and run the type-checker. Don’t fix anything by hand yet; let the full error list materialize.
Delegate the whole list. Give the AI the real compiler output and let it work down it.

Copy-paste prompt for a compiler cascade:

I changed calculate(order: Order) to calculate(order: Order, currency: Currency). Here is the resulting tsc output:

src/billing/invoice.ts:48:21 - error TS2554: Expected 2 arguments, but got 1. src/api/checkout.ts:90:14 - error TS2345: Argument of type 'Order' is not assignable to parameter of type 'Currency'. …(28 more)

Fix every call site. Default currency to the order’s own order.currency where one exists; do not invent a hard-coded currency. After each batch, re-run tsc and continue until it’s clean. Show me the diff before applying.
Iterate until clean. The agent fixes call sites, re-runs the type-checker, and repeats. A task that’s an hour of tedium by hand finishes in a few cycles, the AI never gets bored on call site twenty-seven.

Scenario 3: a failing-test loop

The strongest form of EDD is to write the failing test first, then let the AI drive itself green against it. The test is an unambiguous oracle, so the loop terminates on its own.

That last sentence matters: without it, an over-eager agent will sometimes “fix” the failure by loosening the assertion. Pin the test as the spec.

When this breaks

Chasing the wrong error in a cascade. The first error often causes the rest; fixing error #14 is wasted work if #1 is the trigger. Tell the AI to fix the earliest/root error first and re-run before touching the others.
Fixing the symptom, not the cause. Wrapping undefined access in ?. makes the crash stop without explaining why the value was missing. Always ask “where did this become undefined?”, not just “stop the throw.”
Looping forever on a flaky test. If a test fails non-deterministically, the AI will “fix,” re-run, see green, and declare victory, then it flakes again in CI. Quarantine the flake first; don’t run the EDD loop against it.
Edit-the-test escape hatch. Agents sometimes make a failing assertion pass by weakening it. Add “do not modify the test” to any test-driven prompt.
Context blindness. Pasting only the top stack frame hides the real culprit deeper in the trace. Give the full trace and name the suspect files.

What’s next

Test-Driven Development — write the failing test on purpose, then drive it green
Error Injection Testing — manufacture the failures before production does
Monitoring & Observability — wire up the Sentry signal that feeds this loop