Skip to content

Test Generation and Execution with Codex

Your code coverage report shows 34% and the tech lead just mandated 80% before the next release. You have 200 uncovered functions across 40 files. Writing those tests by hand would take two weeks. With Codex, you can spin up worktree threads that generate tests for different modules in parallel, run them in isolation, and sync the passing suites back to your branch in an afternoon.

  • A parallel test generation workflow using worktrees to cover multiple modules simultaneously
  • Prompts that generate tests matching your existing test conventions, not generic boilerplate
  • A technique for using the CLI /review to audit test quality before merging
  • An automation recipe for nightly test coverage monitoring

Start in the CLI to get a quick picture of what needs testing:

Step 2: Generate Tests in Parallel Worktrees

Section titled “Step 2: Generate Tests in Parallel Worktrees”

Take the prioritized list and split it into parallel workstreams. In the Codex App, create a worktree thread for each module or group of related files. Each worktree gets its own isolated copy of the repo, so the agents can run tests without interfering with each other.

Worktree Thread 1: Service Layer Tests

Worktree Thread 2: API Route Tests

Generate integration tests for the order API routes:
- POST /api/orders (create order)
- DELETE /api/orders/:id (cancel order)
- POST /api/orders/:id/refund (refund order)
Requirements:
- Follow patterns in tests/integration/ (read existing tests first)
- Use supertest for HTTP assertions
- Set up and tear down test database state in beforeEach/afterEach
- Test authentication (valid token, missing token, wrong user)
- Test validation (missing fields, invalid types, boundary values)
- Test error responses (404 for missing orders, 409 for already cancelled)
Run tests after writing them. Fix any failures.

Worktree Thread 3: Edge Case and Error Path Tests

Audit the existing tests in tests/ and identify missing edge case coverage for:
- Error handling paths that are never tested
- Boundary conditions (empty arrays, null values, maximum lengths)
- Concurrent access scenarios
- Timeout and retry behavior
Write tests for the top 15 missing edge cases you find. Follow existing test conventions. Run the full suite after adding them.

Step 3: Review Generated Tests with /review

Section titled “Step 3: Review Generated Tests with /review”

Before merging any generated tests, use the /review command to have Codex audit the test quality. In the CLI, open a session and run:

/review Focus on test quality: Are the assertions meaningful or just checking truthy values? Are the mocks realistic? Do the tests actually exercise the code paths they claim to cover? Flag any tests that would pass even if the implementation was wrong.

Or in the App, open the review pane for each worktree and leave inline comments on tests that look suspicious. Then send a follow-up: “Address the inline comments and improve the flagged tests.”

As each worktree thread completes, sync the results back to your local checkout using Sync with local with the Apply method. After syncing each batch:

Terminal window
npm run test:coverage

Check that coverage actually increased. Codex sometimes generates tests that look comprehensive but do not exercise the right code paths. If coverage for a specific file did not improve, go back to that worktree thread and be more specific:

The tests you generated for createOrder.ts did not cover the discount calculation branch (lines 45-62). Write additional tests that exercise:
1. Orders with no discount
2. Orders with a percentage discount
3. Orders with a fixed amount discount that exceeds the order total
4. Orders with an expired discount code
Run coverage for just this file and confirm those lines are now covered.

Set up an automation in the Codex App to monitor coverage and alert you when it drops:

Run npm run test:coverage and compare the results to the last known coverage baseline in coverage-baseline.json. If any file's coverage dropped by more than 5 percentage points, report which files declined and suggest which new code paths need tests. If coverage improved or stayed stable, report that and update the baseline file.

Schedule this to run daily. Codex creates a worktree for each run, checks coverage, and adds findings to your inbox in the automations pane.

Generated tests pass but do not test the right thing. The most common failure mode is tests that assert implementation details (function was called with these exact arguments) rather than behavior (given this input, the output matches this contract). Tell Codex explicitly: “Test observable behavior, not implementation details. Do not assert on internal function calls unless testing side effects like database writes.”

Worktree tests pass but fail after syncing to local. This happens when the worktree was created from a branch that has since diverged. The fix: create worktrees from the latest commit on your feature branch, not from a stale starting point. Or use Sync from local before running tests in the worktree to pull in recent changes.

Mocks do not match the real implementation. If the code under test changed after the mocks were written, the tests pass against outdated behavior. Include “read the current implementation of each dependency before writing mocks” in your prompt.

Test suite becomes slow. When generating many tests in parallel, you may end up with integration tests that each spin up a database. Add “use shared test database setup and teardown, not per-test database creation” to your constraints.