Test-Driven Development with AI Assistance

You ask the AI to build a rate limiter. It generates something that looks right. You deploy it. Two days later, your API goes down because the rate limiter does not handle concurrent requests correctly — a race condition the AI never thought to test because you never told it what “working” actually means.

TDD flips this dynamic. When you write the tests first, the AI has a precise, machine-verifiable definition of success. It does not have to guess what you want. It runs the tests, sees the failures, and iterates until they pass. This is the single highest-leverage technique for getting reliable output from AI coding assistants.

What You’ll Walk Away With

A workflow for writing test specs first and letting the AI implement against them
Prompts for generating comprehensive test cases from requirements
Strategies for handling the AI’s tendency to make tests pass by weakening assertions
A clear understanding of when TDD with AI saves time versus when it adds overhead

The TDD-AI Feedback Loop

Traditional TDD follows the red-green-refactor cycle. With AI, this cycle becomes dramatically faster because the AI handles the green and refactor phases while you focus on writing meaningful red tests.

You write a failing test that captures the expected behavior
The AI writes code to make the test pass
The AI refactors the implementation while keeping tests green
You review the implementation and write the next test

The critical insight: the test is your specification. A well-written test communicates intent far more precisely than any natural language prompt.

Writing Tests First

Start by defining the behavior you want, not the implementation. You can write the test yourself or collaborate with the AI to generate test cases from requirements — but you must review and approve the tests before asking for implementation.

Open Agent mode. Ask Cursor to help you generate test cases, then review them before requesting implementation:

I need to build a rate limiter middleware for our Express API.
Requirements:
- 100 requests per minute per API key
- Returns 429 with a Retry-After header when exceeded
- Tracks limits in Redis
- Handles concurrent requests correctly

Write the test file first at src/middleware/__tests__/rateLimiter.test.ts.
Use our existing test patterns from @src/middleware/__tests__/auth.test.ts.
Do NOT write the implementation yet.

Review the tests. Add edge cases the AI missed. Then:

Good tests. Now implement src/middleware/rateLimiter.ts to make
all tests pass. Run the tests after implementation and fix any
failures.

Cursor will run the tests in the integrated terminal, see failures, and iterate until they pass.

Use Plan Mode to generate the test specification, then switch to Normal Mode for implementation:

I need a rate limiter middleware. Requirements:
- 100 requests/minute per API key
- 429 response with Retry-After header when exceeded
- Redis-backed tracking
- Thread-safe under concurrent requests

Write comprehensive tests at src/middleware/__tests__/rateLimiter.test.ts.
Follow existing test patterns in the codebase. Do not implement yet.

After reviewing and approving the tests:

Implement src/middleware/rateLimiter.ts to pass all the tests.
Run npm test -- --testPathPattern=rateLimiter after each change
until all tests pass. Address root causes, not symptoms.

Claude Code will run the tests, read the output, fix issues, and re-run automatically. This agentic loop is where TDD with AI truly shines.

Ask Codex to generate the test file first. Codex verifies its own work by running tests in its sandbox:

Write tests for a rate limiter middleware at
src/middleware/__tests__/rateLimiter.test.ts.

Requirements:
- 100 requests/minute per API key
- 429 + Retry-After header when exceeded
- Redis-backed
- Handles concurrent requests safely

Follow existing test patterns in the codebase. Do not implement yet.

After reviewing, start a new thread for implementation:

Implement src/middleware/rateLimiter.ts to pass all tests in
src/middleware/__tests__/rateLimiter.test.ts.
Run the test suite and iterate until all tests pass.

Using separate threads keeps context clean. The implementation thread focuses only on making tests green, without the test-generation conversation cluttering context.

I need to implement [FEATURE]. Here are the requirements:
[paste requirements]

Write comprehensive tests FIRST at [test file path]. Cover:
1. Happy path for each requirement
2. Edge cases (empty input, boundary values, concurrent access)
3. Error cases (invalid input, missing dependencies, timeouts)
4. Integration with existing code (follow patterns in @[existing test file])

Do NOT write the implementation. I will review the tests before we proceed.

Guiding the Implementation Phase

Once your tests are in place, the implementation phase is where the AI delivers the most value. The tests act as a continuous feedback signal that keeps the AI on track.

Reference both the test file and any relevant existing code:

Implement the rate limiter to pass @src/middleware/__tests__/rateLimiter.test.ts.
Reference @src/middleware/auth.ts for our middleware patterns.
Run the tests after each significant change.
If a test fails, read the error carefully and fix the root cause.

Use Cursor’s checkpoints to snapshot known-good states. If the AI breaks something that was passing, rewind to the checkpoint instead of debugging forward.

Give Claude a clear implementation directive with verification:

Implement rateLimiter.ts to pass all tests. After implementation:
1. Run npm test -- --testPathPattern=rateLimiter
2. If any test fails, read the full error output
3. Fix the root cause (don't modify the tests)
4. Re-run until all tests pass
5. Then run the full test suite to check for regressions

If Claude gets stuck in a loop of failing tests, use /clear and start a fresh session with a more targeted prompt that describes the specific failure.

Codex’s sandbox automatically runs verification:

Implement src/middleware/rateLimiter.ts to pass all existing tests.
Follow the middleware pattern from src/middleware/auth.ts.
Run the test suite after implementation. Do not modify test files.
If tests fail, fix the implementation, not the tests.

In the Codex App, you can watch the implementation happen in real-time and see test results as Codex iterates.

Implement [file path] to make all tests in [test file path] pass.

Rules:
- Do NOT modify the test files
- Run the tests after each significant change
- If a test fails, fix the implementation, not the test
- After all tests pass, run the full test suite to check for regressions
- Follow existing patterns in @[reference implementation file]

Catching the “Test Weakening” Anti-Pattern

The most dangerous failure mode with AI-assisted TDD is when the AI modifies your tests to make them pass instead of fixing the implementation. Watch for these signs:

Assertions become less specific. expect(result).toBe(429) becomes expect(result).toBeDefined().
Tests get deleted. The AI removes “flaky” tests instead of fixing the code.
Mocks replace real behavior. The AI mocks out the exact thing you wanted to test.

The implementation passes all current tests. Now let's strengthen coverage.

Review @[implementation file] and write additional tests for:
1. Race conditions (parallel calls with the same input)
2. Boundary values (exactly at the limit, one over, one under)
3. Failure recovery (what happens when Redis goes down mid-request?)
4. Memory/resource cleanup (are connections properly released?)

Add these tests to @[test file]. Then run them -- some SHOULD fail.
We'll fix the implementation next.

When This Breaks

The AI generates trivial tests. If your test prompts are vague (“write tests for this function”), the AI will write tests that verify the function exists and returns something. Be specific about the behaviors you want tested. Include concrete input-output examples.

Tests are too tightly coupled to implementation. If your tests check internal implementation details (private method calls, specific data structures), the AI cannot refactor freely. Write tests against the public API and expected behaviors, not internal mechanics.

The test suite is too slow. AI-assisted TDD works best when tests run in seconds. If your full test suite takes 10 minutes, the AI will not iterate effectively. Use --testPathPattern or --grep to run only the relevant tests during development.

You are writing too many tests upfront. Start with 3-5 tests covering the core behavior. Let the implementation reveal which edge cases matter, then add more tests iteratively. Trying to write 30 tests before any implementation leads to analysis paralysis.

What’s Next

Error-Driven Development When tests reveal unexpected failures, use them as a learning signal instead of fighting them.

PRD to Plan to Todo Combine TDD with structured planning for maximum reliability.

Agent vs Ask Mode Know when to let the AI run tests autonomously vs when to step through manually.