Skip to content

Error-Driven Development: Learning from Failures

CI is red. The stack trace points at payment.ts:212, you’ve already burned twenty minutes guessing which of the last six commits broke it, and the “fix” you just pushed turned one failing test into three. Staring harder at the error isn’t working.

The error message is the most precise specification you have of what’s wrong, and it’s exactly the input an AI assistant is best at consuming. Error-Driven Development (EDD) leans into that: instead of aiming for a perfect first draft, you run a tight error → fix → re-run loop and let each failure steer the next change. Done deliberately, it converges fast even on gnarly cascades.

  • A repeatable error → fix → re-run loop you can run in any of the three tools
  • Copy-paste prompts for a production stack trace, a compiler cascade, and a failing-test loop
  • The per-tool mechanics: who runs the tests, who rolls back a bad fix, who iterates unattended
  • An MCP shortcut that pulls the Sentry issue for you instead of copy-pasting stack traces
  • The failure modes of the loop itself, chasing the wrong error, fixing symptoms, looping on flaky tests

The cycle is the same everywhere, surface an error, hand it to the AI with enough context, apply the fix, re-run the exact thing that failed. What differs is who runs the command and how you back out a bad fix.

In agent mode, Cursor runs the tests or build itself, reads the terminal output, and iterates without you copy-pasting. The safety net is checkpoints: every agent edit is a restore point, so when a “fix” makes things worse you roll back to the last green state in one click instead of untangling it.

Best when you want to watch the loop happen and intervene the moment it goes sideways.

Scenario 1: a production bug from a stack trace

Section titled “Scenario 1: a production bug from a stack trace”

A user hit a crash, and your error tracker captured the exception. The fastest path is to give the AI the trace plus the files it implicates.

  1. Grab the trace. Copy the full exception from Sentry, the runtime error and the stack, not just the top line.

  2. Hand it over with the suspect files. Name the files so the AI doesn’t have to grep blind.

  3. Apply and re-run. The fix should address where total became undefined (an upstream cart with no items, say), not just bolt on ?. at line 212. Re-run the failing path; if a new error surfaces, feed it back and repeat.

Scenario 2: a compiler cascade after a refactor

Section titled “Scenario 2: a compiler cascade after a refactor”

You changed the signature of a core function and the compiler lit up with thirty errors across the codebase. This is EDD’s sweet spot, the errors are an exact, machine-generated worklist.

  1. Make the change and run the type-checker. Don’t fix anything by hand yet; let the full error list materialize.

  2. Delegate the whole list. Give the AI the real compiler output and let it work down it.

  3. Iterate until clean. The agent fixes call sites, re-runs the type-checker, and repeats. A task that’s an hour of tedium by hand finishes in a few cycles, the AI never gets bored on call site twenty-seven.

The strongest form of EDD is to write the failing test first, then let the AI drive itself green against it. The test is an unambiguous oracle, so the loop terminates on its own.

That last sentence matters: without it, an over-eager agent will sometimes “fix” the failure by loosening the assertion. Pin the test as the spec.