Error-Driven Development: Learning from Failures
CI is red. The stack trace points at payment.ts:212, you’ve already burned twenty minutes guessing which of the last six commits broke it, and the “fix” you just pushed turned one failing test into three. Staring harder at the error isn’t working.
The error message is the most precise specification you have of what’s wrong, and it’s exactly the input an AI assistant is best at consuming. Error-Driven Development (EDD) leans into that: instead of aiming for a perfect first draft, you run a tight error → fix → re-run loop and let each failure steer the next change. Done deliberately, it converges fast even on gnarly cascades.
What you’ll walk away with
Section titled “What you’ll walk away with”- A repeatable error → fix → re-run loop you can run in any of the three tools
- Copy-paste prompts for a production stack trace, a compiler cascade, and a failing-test loop
- The per-tool mechanics: who runs the tests, who rolls back a bad fix, who iterates unattended
- An MCP shortcut that pulls the Sentry issue for you instead of copy-pasting stack traces
- The failure modes of the loop itself, chasing the wrong error, fixing symptoms, looping on flaky tests
The loop, by tool
Section titled “The loop, by tool”The cycle is the same everywhere, surface an error, hand it to the AI with enough context, apply the fix, re-run the exact thing that failed. What differs is who runs the command and how you back out a bad fix.
In agent mode, Cursor runs the tests or build itself, reads the terminal output, and iterates without you copy-pasting. The safety net is checkpoints: every agent edit is a restore point, so when a “fix” makes things worse you roll back to the last green state in one click instead of untangling it.
Best when you want to watch the loop happen and intervene the moment it goes sideways.
Claude Code runs the compiler and tests through the Bash tool and feeds the output back to itself, so the loop closes inside one session. In headless mode (claude -p) you can let it iterate against a failing command with a turn cap so it doesn’t spin forever.
Best when the loop should run from the terminal or in CI, scripted and bounded.
Codex runs commands in a sandbox. Set --sandbox workspace-write so it can edit and run, and --ask-for-approval on-failure so it proceeds on green but pauses for you when a command fails, the natural EDD checkpoint. (--full-auto bundles workspace-write + on-request if you want less friction.)
Best when you want it to grind through a long error list mostly unattended but stop at genuine failures.
Scenario 1: a production bug from a stack trace
Section titled “Scenario 1: a production bug from a stack trace”A user hit a crash, and your error tracker captured the exception. The fastest path is to give the AI the trace plus the files it implicates.
-
Grab the trace. Copy the full exception from Sentry, the runtime error and the stack, not just the top line.
-
Hand it over with the suspect files. Name the files so the AI doesn’t have to grep blind.
-
Apply and re-run. The fix should address where
totalbecameundefined(an upstream cart with no items, say), not just bolt on?.at line 212. Re-run the failing path; if a new error surfaces, feed it back and repeat.
Scenario 2: a compiler cascade after a refactor
Section titled “Scenario 2: a compiler cascade after a refactor”You changed the signature of a core function and the compiler lit up with thirty errors across the codebase. This is EDD’s sweet spot, the errors are an exact, machine-generated worklist.
-
Make the change and run the type-checker. Don’t fix anything by hand yet; let the full error list materialize.
-
Delegate the whole list. Give the AI the real compiler output and let it work down it.
-
Iterate until clean. The agent fixes call sites, re-runs the type-checker, and repeats. A task that’s an hour of tedium by hand finishes in a few cycles, the AI never gets bored on call site twenty-seven.
Scenario 3: a failing-test loop
Section titled “Scenario 3: a failing-test loop”The strongest form of EDD is to write the failing test first, then let the AI drive itself green against it. The test is an unambiguous oracle, so the loop terminates on its own.
That last sentence matters: without it, an over-eager agent will sometimes “fix” the failure by loosening the assertion. Pin the test as the spec.
When this breaks
Section titled “When this breaks”What’s next
Section titled “What’s next”- Test-Driven Development — write the failing test on purpose, then drive it green
- Error Injection Testing — manufacture the failures before production does
- Monitoring & Observability — wire up the Sentry signal that feeds this loop