Test Generation and Execution with Codex

Your code coverage report shows 34% and the tech lead just mandated 80% before the next release. You have 200 uncovered functions across 40 files. Writing those tests by hand would take two weeks. With Codex, you can spin up worktree threads that generate tests for different modules in parallel, run them in isolation, and sync the passing suites back to your branch in an afternoon.

What You’ll Walk Away With

A parallel test generation workflow using worktrees to cover multiple modules simultaneously
Prompts that generate tests matching your existing test conventions, not generic boilerplate
A technique for using the CLI /review to audit test quality before merging
An automation recipe for nightly test coverage monitoring

The Workflow

Step 1: Analyze Your Coverage Gaps

Start in the CLI to get a quick picture of what needs testing:

Copy-paste prompt for coverage analysis:

Run the test suite with coverage enabled (npm run test:coverage) and analyze the results. Identify the top 10 files with the lowest coverage that have the most business logic. For each file, report:

1. Current line and branch coverage percentage
2. Which functions are uncovered
3. How critical the uncovered code paths are (based on what the function does)

Prioritize files in src/services/ and src/routes/ over utilities and config files.

Step 2: Generate Tests in Parallel Worktrees

Take the prioritized list and split it into parallel workstreams. In the Codex App, create a worktree thread for each module or group of related files. Each worktree gets its own isolated copy of the repo, so the agents can run tests without interfering with each other.

Worktree Thread 1: Service Layer Tests

Copy-paste prompt for service layer test generation:

Generate unit tests for the following service files:
- src/services/orders/createOrder.ts
- src/services/orders/cancelOrder.ts
- src/services/orders/refundOrder.ts

Requirements:
- Follow the testing patterns in tests/unit/services/ (look at existing tests first)
- Use the same test framework, assertion style, and mock patterns
- Cover the happy path, validation errors, database errors, and edge cases
- Mock external dependencies (database, payment provider, email service)
- Each test file should be independently runnable
- Name files consistently: tests/unit/services/orders/[functionName].test.ts

After writing the tests, run them and fix any failures. Then run the full test suite to ensure no regressions.

Worktree Thread 2: API Route Tests

Generate integration tests for the order API routes:
- POST /api/orders (create order)
- DELETE /api/orders/:id (cancel order)
- POST /api/orders/:id/refund (refund order)

Requirements:
- Follow patterns in tests/integration/ (read existing tests first)
- Use supertest for HTTP assertions
- Set up and tear down test database state in beforeEach/afterEach
- Test authentication (valid token, missing token, wrong user)
- Test validation (missing fields, invalid types, boundary values)
- Test error responses (404 for missing orders, 409 for already cancelled)

Run tests after writing them. Fix any failures.

Worktree Thread 3: Edge Case and Error Path Tests

Audit the existing tests in tests/ and identify missing edge case coverage for:
- Error handling paths that are never tested
- Boundary conditions (empty arrays, null values, maximum lengths)
- Concurrent access scenarios
- Timeout and retry behavior

Write tests for the top 15 missing edge cases you find. Follow existing test conventions. Run the full suite after adding them.

Step 3: Review Generated Tests with /review

Before merging any generated tests, use the /review command to have Codex audit the test quality. In the CLI, open a session and run:

/review Focus on test quality: Are the assertions meaningful or just checking truthy values? Are the mocks realistic? Do the tests actually exercise the code paths they claim to cover? Flag any tests that would pass even if the implementation was wrong.

Or in the App, open the review pane for each worktree and leave inline comments on tests that look suspicious. Then send a follow-up: “Address the inline comments and improve the flagged tests.”

Step 4: Sync and Verify Coverage

As each worktree thread completes, sync the results back to your local checkout using Sync with local with the Apply method. After syncing each batch:

npm run test:coverage

Check that coverage actually increased. Codex sometimes generates tests that look comprehensive but do not exercise the right code paths. If coverage for a specific file did not improve, go back to that worktree thread and be more specific:

The tests you generated for createOrder.ts did not cover the discount calculation branch (lines 45-62). Write additional tests that exercise:
1. Orders with no discount
2. Orders with a percentage discount
3. Orders with a fixed amount discount that exceeds the order total
4. Orders with an expired discount code

Run coverage for just this file and confirm those lines are now covered.

Step 5: Automate Coverage Monitoring

Set up an automation in the Codex App to monitor coverage and alert you when it drops:

Run npm run test:coverage and compare the results to the last known coverage baseline in coverage-baseline.json. If any file's coverage dropped by more than 5 percentage points, report which files declined and suggest which new code paths need tests. If coverage improved or stayed stable, report that and update the baseline file.

Schedule this to run daily. Codex creates a worktree for each run, checks coverage, and adds findings to your inbox in the automations pane.

Copy-paste automation prompt for weekly test health check:

Weekly test suite health check:

1. Run the full test suite. Report any failures.
2. Run coverage and compare to coverage-baseline.json.
3. Identify any flaky tests (tests that pass sometimes and fail sometimes) by running the suite 3 times.
4. Report: test count, pass rate, coverage delta, and any flaky tests.

If there are failures or flaky tests, investigate the root cause and propose minimal fixes.

When This Breaks

Generated tests pass but do not test the right thing. The most common failure mode is tests that assert implementation details (function was called with these exact arguments) rather than behavior (given this input, the output matches this contract). Tell Codex explicitly: “Test observable behavior, not implementation details. Do not assert on internal function calls unless testing side effects like database writes.”

Worktree tests pass but fail after syncing to local. This happens when the worktree was created from a branch that has since diverged. The fix: create worktrees from the latest commit on your feature branch, not from a stale starting point. Or use Sync from local before running tests in the worktree to pull in recent changes.

Mocks do not match the real implementation. If the code under test changed after the mocks were written, the tests pass against outdated behavior. Include “read the current implementation of each dependency before writing mocks” in your prompt.

Test suite becomes slow. When generating many tests in parallel, you may end up with integration tests that each spin up a database. Add “use shared test database setup and teardown, not per-test database creation” to your constraints.

What’s Next

Large-Scale Refactoring with Worktrees Apply the parallel worktree pattern to refactoring across hundreds of files

CI/CD with Codex GitHub Action Run test generation and coverage checks automatically in your CI pipeline