Skip to content

Test-Driven Development in Cursor

You are building a markdown-to-HTML converter for your CMS. You know from experience that no AI can consistently one-shot this — there are too many edge cases with nested lists, code blocks inside blockquotes, and HTML entities. The traditional approach: write the code, test it by hand, find a bug, fix it, find another bug, fix that, repeat until you stop finding bugs (which does not mean there are none left).

There is a better way. Write the tests first. Give Cursor the tests and ask it to implement code until all tests pass. Then add more edge case tests and let it iterate again. This is TDD on autopilot — you define what “correct” means, the AI figures out how to get there. Steve Sewell at Builder.io describes this as “a thousand times better” than the traditional AI workflow, and he is right. The AI stops being a code generator that needs QA and becomes a code generator that QAs itself.

  • A TDD workflow where Cursor writes tests first, implements second, and iterates until green
  • A prompt template for generating edge-case-rich test suites that catch the bugs AI typically introduces
  • A technique for extending existing test suites iteratively by pasting production failures into new test cases
  • A pre-PR validation prompt that runs your build, lint, and test suite and fixes failures automatically
  • Understanding of when TDD with AI works brilliantly and when it falls apart

Step 1: Define the contract with tests first

Section titled “Step 1: Define the contract with tests first”

Start by having Agent write the test suite. The key insight: tests written before the implementation are a specification, not just verification. They define exactly what the code should do.

Agent will create a comprehensive test file with 30-50 test cases and a stub implementation. Run the tests — they should all fail (except potentially any tests for empty/whitespace input).

Now hand the tests back to Agent and ask it to implement the code. With auto-run enabled, Agent will write code, run tests, see failures, and fix them in a loop.

This is where auto-run (or what was previously called “YOLO mode”) pays off. Agent will run vitest, see 40 failures, fix the most common parsing issues, run again, see 15 failures, fix those, and so on. You monitor the progress but do not need to intervene unless it gets stuck in a loop.

Step 3: Add edge cases from real-world failures

Section titled “Step 3: Add edge cases from real-world failures”

Once all initial tests pass, add more tests based on the kind of markdown you actually encounter in production. This is where you paste real content that broke previous implementations.

@src/lib/markdown/__tests__/markdown-to-html.test.ts
Add these additional test cases based on real-world markdown we've seen break parsers:
1. A code block containing markdown syntax (should NOT be parsed)
2. A link inside a heading: ## [Click here](url)
3. Consecutive blockquotes with different content
4. A list item that spans multiple lines with a continuation indent
5. An image inside a link: [![alt](img-src)](link-url)
6. Mixed HTML and markdown: <div>**bold inside div**</div>
7. Table syntax with alignment indicators (|:---|:---:|---:|)
Run the tests. If any new tests fail, fix the implementation and iterate until all pass.

This is the workflow that ties testing into your daily development cycle. Before opening any PR, have Agent run your full validation suite and fix what it finds.

This single prompt replaces a manual pre-PR checklist. It runs fast checks first (TypeScript compilation, lint) before slower ones (test suite, coverage), and fixes issues as it finds them. Some developers run this at the end of every coding session.

When a bug report comes in, turn it into a failing test before investigating the fix. This guarantees the bug is fixed and stays fixed.

Bug report: When a user pastes markdown with Windows-style line endings (\r\n),
the converter produces double-spaced HTML output.
1. Write a failing test that reproduces this exact behavior
2. Run it to confirm it fails
3. Fix the implementation
4. Run ALL tests to confirm the fix doesn't break anything else
5. The test should remain in the suite permanently as a regression guard

TDD works for React components too, though the feedback loop is slightly different.

@src/components/data-table.tsx
Write tests for the DataTable component using vitest and @testing-library/react:
1. Renders the correct number of rows from the data prop
2. Clicking a column header calls onSort with the column name
3. Shows the empty state when data is an empty array
4. Shows skeleton rows when isLoading is true
5. Applies the correct aria-sort attribute to the sorted column
6. Keyboard navigation: Tab focuses the first sortable header, Enter triggers sort
Run the tests against the existing component. If any fail, tell me whether
the test expectation is wrong or the component needs fixing.

The TDD-with-AI approach works brilliantly for:

  • Pure functions with clear inputs and outputs (parsers, formatters, validators, calculators)
  • API endpoint handlers where you can define request/response contracts as tests
  • Data transformation logic where edge cases are well-defined
  • Bug fixes where the bug can be expressed as a failing test

It works less well for:

  • UI layout and visual design (tests cannot verify that something “looks right”)
  • Complex integration logic where test setup is more code than the implementation
  • Performance optimization where correctness tests pass but the code is too slow

For the weak spots, combine TDD with other techniques: visual review for UI, benchmarks for performance, integration tests with real services for complex orchestration.

Agent modifies tests to make them pass. This is the most dangerous failure mode. The prompt says “fix the implementation, not the tests,” but under pressure (many failing tests), the AI sometimes takes shortcuts. Always diff the test file after an implementation round. If tests changed, reject and re-run.

Tests are too coupled to implementation details. If your tests check that a specific internal function was called or that a particular intermediate data structure was created, they become brittle. Test the public interface — inputs and outputs — not the implementation path. Add a rule to .cursor/rules: “Tests should only assert on public function return values and side effects, never on internal implementation details.”

The AI writes tests that pass trivially. Watch for tests like expect(result).toBeDefined() or expect(typeof result).toBe('string') — these pass for almost any implementation. Tests should assert specific values: expect(result).toBe('<h1>Hello</h1>').

Coverage report says 95% but bugs still slip through. Line coverage is necessary but not sufficient. Add branch coverage requirements and mutation testing. Agent can set up Stryker mutation testing to verify that your tests actually catch bugs, not just execute code paths.

The TDD loop gets stuck. If Agent is failing to make progress after 3-4 iterations (test count stays the same), the remaining tests likely expose a fundamental design issue. Stop the loop, use Ask mode to analyze the failing tests, and consider whether the implementation approach needs to change rather than just patching individual test failures.