Skip to content

Error Recovery

Codex just ran npm install and it failed because your project uses pnpm. It tried to import a module that does not exist. It went down a rabbit hole of increasingly wrong fixes for a type error. Or it simply stopped responding mid-task. These are not edge cases — they are normal parts of working with an AI agent, and how you recover determines whether Codex stays productive or wastes your afternoon.

  • A mental model for the five most common Codex failure categories
  • Recovery techniques for each failure type across all surfaces
  • The undo system and Git checkpoint strategy for safe rollbacks
  • Patterns for breaking out of error loops
  • Prompt templates for getting Codex back on track quickly

Every Codex failure falls into one of these buckets:

CategoryWhat happenedTypical symptom
Sandbox/PermissionCodex tried to access something outside its allowed scope”Permission denied,” “Sandboxed operation blocked”
Command ExecutionA shell command failednpm errors, test failures, compilation errors
Wrong DirectionCodex implemented the wrong approach or misunderstood the taskCode that compiles but does not match requirements
Context OverflowThe conversation grew too long and Codex lost track of earlier instructionsRepeating work, contradicting earlier decisions
Network/ConnectionMCP server crashed, API timed out, or auth token expired”Connection reset,” “Server not responding,” stalled progress

Codex operates in a sandbox that restricts filesystem and network access. With workspace-write mode (the default), Codex can only write to your project directory. If your task requires writing elsewhere or making network calls, the sandbox blocks it.

If the task legitimately needs broader access, adjust the sandbox in config.toml:

# Allow full filesystem and network access (use with caution)
sandbox_mode = "danger-full-access"

Or override for a single session:

Terminal window
codex --sandbox-mode danger-full-access

Codex runs a command and it fails. Maybe npm install failed because the project uses pnpm. Maybe python manage.py test failed because the database is not running. Maybe a compile error cascaded into test failures.

  1. Read the error output. Codex shows the full stderr. Often the error message tells you exactly what is wrong.

  2. Let Codex try to fix it. In many cases, Codex automatically reads the error, diagnoses the cause, and suggests a fix. Let it iterate once before intervening.

  3. Provide the missing context. If Codex guessed the wrong package manager or test runner, tell it explicitly:

    This project uses pnpm, not npm. The test command is "pnpm vitest", not "npm test".
    Update your approach and try again.
  4. Update AGENTS.md. If this is a recurring mistake, add the correct commands to your AGENTS.md so Codex gets it right next time.

This is the trickiest failure. Codex did not error out — the code compiles, maybe even passes tests — but it implemented the wrong thing. It used Express instead of Fastify. It added a REST endpoint when you wanted GraphQL. It restructured code in a way that works but does not match your architecture.

The Codex App’s review pane lets you revert at any granularity:

  • Revert all: Discard everything and start over.
  • Revert file: Keep good files, discard bad ones.
  • Revert hunk: Keep good changes within a file, discard specific hunks.

After reverting, send a follow-up message with clearer constraints.

The key to wrong-direction recovery is catching it early. Review the first few file edits before Codex finishes the entire task. If it is heading the wrong way, interrupt (Esc in the CLI, stop in the App) and redirect immediately rather than waiting for a completed but wrong implementation.

After a long conversation, Codex starts forgetting earlier instructions, repeating work it already did, or contradicting decisions from earlier in the thread. The context window has filled up and older messages are being compressed or dropped.

  1. Save state to AGENTS.md or a summary file:

    Summarize everything we have accomplished in this session: decisions made,
    files changed, patterns established. Save this summary to docs/session-state.md.
  2. Start a new thread (App) or start a fresh conversation (CLI):

    /new
  3. Resume with the saved context:

    Read docs/session-state.md for context from our previous session. Continue
    from where we left off -- the next task is implementing the payment webhook handler.

If an MCP server stops responding:

Terminal window
# Check server status
codex mcp list
# The server may need a restart -- close and reopen Codex
# Or disable the crashed server and continue without it

In the App, go to Settings > MCP and toggle the server off and back on.

If Codex itself stops responding (the model API timed out):

  • CLI: Press Esc to cancel the current request, then resend your prompt.
  • App: Click the stop button on the thread, wait a moment, then send a follow-up.
  • IDE Extension: Cancel the request and try again.

If timeouts persist, check your network connection. Codex requires a stable internet connection to communicate with OpenAI’s API.

If you get authentication errors mid-session:

Terminal window
# CLI: Re-authenticate
codex login
# App: The App should prompt you to re-authenticate automatically

The most reliable safety net is frequent Git commits. Before any risky Codex task:

Terminal window
git add -A && git commit -m "checkpoint: before codex refactoring"

This gives you a clean rollback point. If Codex goes sideways, you can always:

Terminal window
git reset --hard HEAD

In the Codex App, use Worktree mode to isolate changes automatically. Worktrees create a separate Git checkout, so your main working directory stays clean regardless of what Codex does.

The most frustrating failure: Codex tries to fix an error, the fix creates a new error, it tries to fix that, which creates another error, and the cycle continues. Here is how to break out:

  1. Stop the loop. Press Esc (CLI), click Stop (App), or cancel (IDE).

  2. Undo all the loop iterations:

    /undo

    Or revert in the review pane. Get back to the state before the loop started.

  3. Diagnose the root cause yourself. Read the original error that started the loop.

  4. Give Codex explicit constraints:

    The original error was [paste error]. You tried [approach] which created a
    cascade of new errors. Instead, take this approach: [your idea]. Do not modify
    any files outside of src/auth/. Run the tests after each change to catch
    regressions early.
  5. Break the task into smaller steps. Instead of one big task, give Codex three small ones.

Undo does not work: The undo feature relies on per-turn Git snapshots. If you disabled the undo feature flag, or the project is not a Git repository, undo is unavailable. Use git checkout -- . as a fallback.

Codex ignores your correction: Clear the context and start a new thread. In a long conversation, a correction in the middle may not override earlier (incorrect) instructions that carry more weight.

Recovery prompts get ignored: The model may be in a state where it keeps trying the same approach. Start a completely new session (new thread in the App, new codex invocation in the CLI) with a clean prompt that includes all the constraints upfront.

Cloud tasks cannot be stopped: Cloud tasks run asynchronously. If a cloud task is going sideways, you cannot interrupt it mid-execution. You can close the thread and start a new one. The cloud task may still finish and produce a PR, which you can simply close.

You have completed the entire Codex quick-start section. You can install, authenticate, configure, write instructions, connect tools, integrate with GitHub, run tasks, review changes, and recover from failures. Here is where to go from here: