Debugging with Codex Across Surfaces
The on-call pager fires at 2 PM. Users report that saving their profile settings “works” — they see the success toast — but the changes do not persist after a page refresh. The logs show 200 responses. The database shows stale data. Somewhere between the frontend optimistic update and the backend commit, something is silently failing. You need to find it, fix it, and verify the fix without breaking anything else. Codex gives you multiple surfaces to attack this from simultaneously.
What You’ll Walk Away With
Section titled “What You’ll Walk Away With”- A reproduction-first debugging workflow that works across all Codex surfaces
- Prompts for CLI debugging with tight feedback loops, App debugging with persistent context, and GitHub delegation for CI failures
- Techniques for using cloud tasks to reproduce environment-specific bugs
- The
@codexGitHub pattern for delegating bug fixes directly from issues and PRs
The Workflow
Section titled “The Workflow”Choose Your Debugging Surface
Section titled “Choose Your Debugging Surface”Different bugs call for different surfaces:
Best when you can reproduce the bug locally and want a fast investigate-fix-verify cycle. The CLI lets you pipe error output directly into Codex, run commands inline, and iterate rapidly.
codexBest for complex bugs that require sustained investigation across multiple files. The App maintains conversation history, lets you run parallel investigation threads in worktrees, and has the review pane for inspecting changes.
Best for CI failures and bugs reported in issues. Comment @codex fix the CI failures on a PR and Codex creates a cloud task that investigates and proposes a fix. No local setup required.
Best for bugs that only reproduce in specific environments — different OS, different Node version, missing dependencies. Cloud tasks run in a controlled container where you can configure the exact environment.
Step 1: Reproduce First
Section titled “Step 1: Reproduce First”The single most important debugging step is getting a reliable reproduction. Give Codex explicit reproduction steps, not just a bug description.
In the CLI, Codex will run the dev server, attempt the reproduction, read the relevant source files, and trace the issue. The tight feedback loop — command, output, reasoning, next command — is where the CLI shines for debugging.
Step 2: Investigate with Context
Section titled “Step 2: Investigate with Context”Once you have a reproduction, the investigation phase benefits from the Codex App’s persistent context. Open the project in the App, create a Local thread, and describe what you have found so far:
The App keeps the full conversation, so you can ask follow-up questions without re-explaining context. If Codex identifies the issue in the handler, you can leave an inline comment on the specific line in the review pane, then ask it to fix just that part.
Step 3: Fix in a Worktree
Section titled “Step 3: Fix in a Worktree”Never fix bugs directly in your working directory if you have other uncommitted work. Switch to Worktree mode and base it on the branch with the bug:
Fix the profile settings persistence bug. The root cause is that the Drizzle update call in src/routes/profile.ts uses .set() but does not call .where() with the user ID, so it matches zero rows and returns silently.
Fix the query, add a check that the update affected exactly one row (throw 404 if zero), and write a regression test that:1. Creates a user2. Updates their profile via PUT3. Reads the profile via GET4. Asserts the updated values are returned
Run the full test suite after the fix.Step 4: Delegate CI Failures from GitHub
Section titled “Step 4: Delegate CI Failures from GitHub”When a CI pipeline fails on a pull request, you do not need to check out the branch locally. Comment directly on the PR:
@codex fix the CI failuresCodex creates a cloud task, reads the PR diff and CI logs, identifies the failure, and proposes a fix. It posts the results back on the PR as a comment with a link to the task. If the fix involves code changes, you can open a PR from the cloud task.
For more targeted investigation, be specific:
@codex The integration tests for the payment webhook handler are failing with "connection refused" on the Redis mock. Investigate and fix.Step 5: Debug Environment-Specific Issues in Cloud
Section titled “Step 5: Debug Environment-Specific Issues in Cloud”Some bugs only reproduce in specific environments. Cloud tasks run in the codex-universal container where you can pin Node.js versions, install system dependencies, and configure environment variables.
From the CLI:
codex cloud exec --env production-mirror "The cron job that processes expired subscriptions is silently skipping records when run with Node 20. Reproduce the issue using the test data in tests/fixtures/expired-subscriptions.json, identify the root cause, and propose a fix."Use --attempts 3 for best-of-N when the bug is intermittent:
codex cloud exec --env production-mirror --attempts 3 "Reproduce the race condition in the WebSocket reconnection handler. It happens approximately 1 in 5 times when the server restarts during an active connection."When This Breaks
Section titled “When This Breaks”Codex cannot reproduce the bug. If the reproduction requires state that does not exist in the dev environment (specific user data, third-party API responses, production traffic patterns), the bug will not reproduce locally. Provide Codex with the exact error messages, stack traces, and relevant log lines instead of reproduction steps. Use cloud environments with seeded test data for closer-to-production reproduction.
The fix breaks other tests. This is why you always include “Run the full test suite after the fix” in your prompt. If Codex runs only the test it wrote and not the existing suite, you will discover regressions after merging. Be explicit: “Run npm test (the full suite), not just the new test.”
@codex on GitHub does not respond. Codex must be enabled for code review and cloud tasks on your repository. Check your Codex settings at chatgpt.com/codex/settings. Also ensure the comment uses @codex (lowercase) — casing matters.
Cloud task produces a fix that works in the container but fails locally. Environment differences between the universal container and your local machine can cause this. Pin versions in your cloud environment settings to match your local setup, or add the specific versions to your setup script.