Skip to content

AI-Powered Testing and Quality Assurance

Your test suite passes locally but flakes in CI. Coverage has been stuck at 60% for a month because nobody wants to write the boring cases. And the one integration test that would have caught last week’s regression was never written, because it was a hassle to set up the browser. Testing is exactly the kind of high-volume, pattern-heavy work where Cursor’s agent earns its keep — if you drive it with the right prompts and keep yourself in the loop on what actually gets asserted.

  • A Cursor Auto-Run configuration that lets the agent run tests and iterate without babysitting
  • A copy-paste TDD prompt that makes the agent write the test first, watch it fail, then implement
  • Prompts for the cases developers skip: async retries, error paths, and the gaps a coverage report flags
  • A Playwright MCP setup for browser-driven E2E tests the agent can run end to end
  • A debugging workflow for the worst kind of failure — the intermittent one
  • A clear list of where AI-generated tests go wrong, and how to catch it

Configure Auto-Run So the Agent Can Iterate

Section titled “Configure Auto-Run So the Agent Can Iterate”

The TDD loop only works if the agent can run your tests itself, read the failures, and try again without stopping to ask permission for every npm test. In current Cursor that lives under Cursor Settings -> Agents -> Auto-Run (this is the feature that used to be called “YOLO Mode”).

Set Auto-Run Mode to Run Everything if you trust the agent in this repo, or keep it on Ask Every Time and add the safe commands to the Command Allowlist so test and build commands run without prompts but destructive commands still pause.

With Auto-Run configured this way, the agent can run the suite, fix failing tests by updating the source, and re-run until green — the core of the TDD loop below.

The Red -> Green -> Refactor cycle maps cleanly onto an agent loop: it writes a failing test, runs it to confirm it fails for the right reason, implements until green, then refactors with the tests as a safety net. The discipline that matters is making the agent write the test first — otherwise it writes code and tests together, and the tests just rubber-stamp whatever it produced.

A typical run looks like this: the agent writes the spec, runs it, sees the failures, and iterates.

describe('markdownToHtml', () => {
it('converts headings', () => {
expect(markdownToHtml('# Title')).toBe('<h1>Title</h1>');
expect(markdownToHtml('## Subtitle')).toBe('<h2>Subtitle</h2>');
});
it('escapes HTML special characters in text', () => {
expect(markdownToHtml('5 < 6 & 7 > 2')).toBe('<p>5 &lt; 6 &amp; 7 &gt; 2</p>');
});
it('handles unclosed bold markers gracefully', () => {
expect(markdownToHtml('**bold')).toBe('<p>**bold</p>');
});
});

The escaping and unclosed-marker cases are the ones a human usually forgets — and the ones that turn into XSS bugs or broken output in production. Naming them explicitly in the prompt is what gets the agent to cover them.

The agent is strongest at unit tests because the patterns are well-established and it can run them in a tight loop. Point it at the existing tests in your repo so it matches your conventions instead of inventing new ones.

Write Vitest + React Testing Library tests for this UserProfile
component. Cover: renders name and avatar from props, falls back to
initials when avatar is missing, calls onEdit when the edit button is
clicked, and disables the edit button while saving is true. Follow the
query and matcher conventions in src/components/__tests__/Card.test.tsx.
describe('UserProfile', () => {
it('renders user name and avatar', () => {
render(<UserProfile user={{ name: 'Ada', avatar: '/a.png' }} />);
expect(screen.getByText('Ada')).toBeInTheDocument();
expect(screen.getByRole('img')).toHaveAttribute('src', '/a.png');
});
it('falls back to initials when avatar is missing', () => {
render(<UserProfile user={{ name: 'Ada Lovelace' }} />);
expect(screen.getByText('AL')).toBeInTheDocument();
});
it('calls onEdit when the edit button is clicked', async () => {
const onEdit = vi.fn();
render(<UserProfile user={{ name: 'Ada' }} onEdit={onEdit} />);
await userEvent.click(screen.getByRole('button', { name: /edit/i }));
expect(onEdit).toHaveBeenCalledOnce();
});
});

Async operations are where coverage usually lies

Section titled “Async operations are where coverage usually lies”

A green “happy path” async test hides the cases that actually break in production: the retry, the timeout, the rejected promise. Make the agent test those explicitly.

describe('fetchUserData', () => {
it('retries once on network failure, then succeeds', async () => {
fetch
.mockRejectedValueOnce(new Error('Network error'))
.mockResolvedValueOnce({ ok: true, json: async () => ({ id: 1 }) });
const result = await fetchUserData(1);
expect(fetch).toHaveBeenCalledTimes(2);
expect(result).toEqual({ id: 1 });
});
it('does not retry on a 4xx response', async () => {
fetch.mockResolvedValueOnce({ ok: false, status: 404 });
await expect(fetchUserData(1)).rejects.toThrow();
expect(fetch).toHaveBeenCalledOnce();
});
});

For browser-driven end-to-end tests, give the agent the Playwright MCP server. It lets the agent drive a real browser — navigate, fill forms, click, and assert — so it can both write the test and run it against your local app to confirm it actually works.

test('successful login redirects to dashboard', async ({ page }) => {
await page.goto('http://localhost:3000/login');
await page.getByLabel('Email').fill('user@example.com');
await page.getByLabel('Password').fill('correct-password');
await page.getByRole('button', { name: 'Sign in' }).click();
await expect(page).toHaveURL(/\/dashboard/);
});
test('shows error for invalid credentials', async ({ page }) => {
await page.goto('http://localhost:3000/login');
await page.getByLabel('Email').fill('user@example.com');
await page.getByLabel('Password').fill('wrong-password');
await page.getByRole('button', { name: 'Sign in' }).click();
await expect(page.getByRole('alert')).toContainText('Invalid credentials');
});

Accessible locators (getByRole, getByLabel) survive CSS refactors that would break brittle #id/.class selectors, which is why they are worth insisting on in the prompt.

Debugging the Worst Failure: the Intermittent One

Section titled “Debugging the Worst Failure: the Intermittent One”

A test that fails one run in ten is the failure mode that wastes the most time, and it is exactly where the agent’s ability to run a test many times in a row pays off.

In practice the agent will surface the usual culprits: a previous test that did not clean up its database connections, an afterEach that was never added, or an assertion that races a debounced update. The fix is almost always isolation — a proper teardown hook or an awaited settle — not a longer timeout.

Coverage percentage on its own is a vanity metric — 100% line coverage with weak assertions still ships bugs. Use the report to find which paths are untested, then ask for meaningful tests on the risky ones.

File | % Stmts | % Branch | % Funcs | % Lines |
------------------|---------|----------|---------|---------|
auth/login.js | 75.00 | 66.67 | 100.00 | 75.00 |
auth/reset.js | 45.00 | 33.33 | 66.67 | 45.00 |

A report like this tells the agent where to aim: auth/reset.js has the thinnest branch coverage, so the password-reset edge cases (expired token, already-used token, unknown email) are the ones worth writing.

The agent edits assertions instead of fixing the code. When a test fails, the agent sometimes “fixes” it by weakening the assertion (toBe becomes toBeDefined). Always include “fix the implementation, not the test” in TDD prompts, and review the diff — if the test got easier instead of the code getting correct, reject it and re-prompt.

Tests pass but the feature is broken (false negatives). Over-mocked tests assert against the mock, not real behavior. If a test never exercises the integration it claims to cover, it gives false confidence. Prompt: “Review these tests — which ones would still pass if the implementation were deleted? Strengthen them to assert real behavior.” Keep at least one integration or E2E test that hits the real seams.

Auto-Run does something destructive. “Run Everything” is powerful and indiscriminate. If the agent runs a command that drops data or pushes to a remote, switch to Ask Every Time with a Command Allowlist (see the policy above) so only safe commands auto-run and everything else pauses for approval.

The Playwright MCP can’t reach your app. The agent’s browser tests fail immediately with connection errors when the dev server isn’t running or is on a different port. Start the app first (npm run dev), confirm the URL in the prompt matches, and tell the agent to wait for the server to be ready before navigating.

Flaky tests “fixed” with longer timeouts. If the agent’s flaky-test fix is just a bigger timeout, it has masked the race, not solved it. Re-prompt for the root cause (leaked state, unawaited promise) and insist on a deterministic fix verified by 25 consecutive passing runs.