Error Recovery
Codex just ran npm install and it failed because your project uses pnpm. It tried to import a module that does not exist. It went down a rabbit hole of increasingly wrong fixes for a type error. Or it simply stopped responding mid-task. These are not edge cases — they are normal parts of working with an AI agent, and how you recover determines whether Codex stays productive or wastes your afternoon.
What You’ll Walk Away With
Section titled “What You’ll Walk Away With”- A mental model for the five most common Codex failure categories
- Recovery techniques for each failure type across all surfaces
- The undo system and Git checkpoint strategy for safe rollbacks
- Patterns for breaking out of error loops
- Prompt templates for getting Codex back on track quickly
The Five Failure Categories
Section titled “The Five Failure Categories”Every Codex failure falls into one of these buckets:
| Category | What happened | Typical symptom |
|---|---|---|
| Sandbox/Permission | Codex tried to access something outside its allowed scope | ”Permission denied,” “Sandboxed operation blocked” |
| Command Execution | A shell command failed | npm errors, test failures, compilation errors |
| Wrong Direction | Codex implemented the wrong approach or misunderstood the task | Code that compiles but does not match requirements |
| Context Overflow | The conversation grew too long and Codex lost track of earlier instructions | Repeating work, contradicting earlier decisions |
| Network/Connection | MCP server crashed, API timed out, or auth token expired | ”Connection reset,” “Server not responding,” stalled progress |
Sandbox and Permission Recovery
Section titled “Sandbox and Permission Recovery”The Problem
Section titled “The Problem”Codex operates in a sandbox that restricts filesystem and network access. With workspace-write mode (the default), Codex can only write to your project directory. If your task requires writing elsewhere or making network calls, the sandbox blocks it.
Recovery
Section titled “Recovery”If the task legitimately needs broader access, adjust the sandbox in config.toml:
# Allow full filesystem and network access (use with caution)sandbox_mode = "danger-full-access"Or override for a single session:
codex --sandbox-mode danger-full-accessWith on-request approval mode, Codex pauses before any restricted action. Review each request and approve or deny:
- Approve once: Allow this specific action, ask again next time.
- Approve for session: Allow this type of action for the rest of the session.
This gives you fine-grained control without opening the sandbox wide.
Sometimes the sandbox is correct and the approach is wrong. Ask Codex to find an alternative:
The sandbox blocked writing to /usr/local/. Instead of modifying system files,create a local configuration in the project directory that achieves the same result.Command Execution Recovery
Section titled “Command Execution Recovery”The Problem
Section titled “The Problem”Codex runs a command and it fails. Maybe npm install failed because the project uses pnpm. Maybe python manage.py test failed because the database is not running. Maybe a compile error cascaded into test failures.
Recovery
Section titled “Recovery”-
Read the error output. Codex shows the full stderr. Often the error message tells you exactly what is wrong.
-
Let Codex try to fix it. In many cases, Codex automatically reads the error, diagnoses the cause, and suggests a fix. Let it iterate once before intervening.
-
Provide the missing context. If Codex guessed the wrong package manager or test runner, tell it explicitly:
This project uses pnpm, not npm. The test command is "pnpm vitest", not "npm test".Update your approach and try again. -
Update AGENTS.md. If this is a recurring mistake, add the correct commands to your AGENTS.md so Codex gets it right next time.
Wrong-Direction Recovery
Section titled “Wrong-Direction Recovery”The Problem
Section titled “The Problem”This is the trickiest failure. Codex did not error out — the code compiles, maybe even passes tests — but it implemented the wrong thing. It used Express instead of Fastify. It added a REST endpoint when you wanted GraphQL. It restructured code in a way that works but does not match your architecture.
Recovery
Section titled “Recovery”The Codex App’s review pane lets you revert at any granularity:
- Revert all: Discard everything and start over.
- Revert file: Keep good files, discard bad ones.
- Revert hunk: Keep good changes within a file, discard specific hunks.
After reverting, send a follow-up message with clearer constraints.
The CLI supports undo via per-turn Git ghost snapshots:
/undoThis rolls back to the state before the last turn. You can undo multiple turns.
Alternatively, use Git directly:
git checkout -- .Sometimes you do not need to undo — you need to redirect. Tell Codex what went wrong and what you want instead:
Stop. You used Express but this project uses Fastify. Revert the Express-specificchanges and re-implement using Fastify's route handler pattern. Check AGENTS.mdfor the project's framework before making changes.The key to wrong-direction recovery is catching it early. Review the first few file edits before Codex finishes the entire task. If it is heading the wrong way, interrupt (Esc in the CLI, stop in the App) and redirect immediately rather than waiting for a completed but wrong implementation.
Context Overflow Recovery
Section titled “Context Overflow Recovery”The Problem
Section titled “The Problem”After a long conversation, Codex starts forgetting earlier instructions, repeating work it already did, or contradicting decisions from earlier in the thread. The context window has filled up and older messages are being compressed or dropped.
Recovery
Section titled “Recovery”-
Save state to AGENTS.md or a summary file:
Summarize everything we have accomplished in this session: decisions made,files changed, patterns established. Save this summary to docs/session-state.md. -
Start a new thread (App) or start a fresh conversation (CLI):
/new -
Resume with the saved context:
Read docs/session-state.md for context from our previous session. Continuefrom where we left off -- the next task is implementing the payment webhook handler.
Network and Connection Recovery
Section titled “Network and Connection Recovery”MCP Server Crashes
Section titled “MCP Server Crashes”If an MCP server stops responding:
# Check server statuscodex mcp list
# The server may need a restart -- close and reopen Codex# Or disable the crashed server and continue without itIn the App, go to Settings > MCP and toggle the server off and back on.
API Timeouts
Section titled “API Timeouts”If Codex itself stops responding (the model API timed out):
- CLI: Press Esc to cancel the current request, then resend your prompt.
- App: Click the stop button on the thread, wait a moment, then send a follow-up.
- IDE Extension: Cancel the request and try again.
If timeouts persist, check your network connection. Codex requires a stable internet connection to communicate with OpenAI’s API.
Auth Token Expiry
Section titled “Auth Token Expiry”If you get authentication errors mid-session:
# CLI: Re-authenticatecodex login
# App: The App should prompt you to re-authenticate automaticallyThe Git Checkpoint Strategy
Section titled “The Git Checkpoint Strategy”The most reliable safety net is frequent Git commits. Before any risky Codex task:
git add -A && git commit -m "checkpoint: before codex refactoring"This gives you a clean rollback point. If Codex goes sideways, you can always:
git reset --hard HEADIn the Codex App, use Worktree mode to isolate changes automatically. Worktrees create a separate Git checkout, so your main working directory stays clean regardless of what Codex does.
Breaking Out of Error Loops
Section titled “Breaking Out of Error Loops”The most frustrating failure: Codex tries to fix an error, the fix creates a new error, it tries to fix that, which creates another error, and the cycle continues. Here is how to break out:
-
Stop the loop. Press Esc (CLI), click Stop (App), or cancel (IDE).
-
Undo all the loop iterations:
/undoOr revert in the review pane. Get back to the state before the loop started.
-
Diagnose the root cause yourself. Read the original error that started the loop.
-
Give Codex explicit constraints:
The original error was [paste error]. You tried [approach] which created acascade of new errors. Instead, take this approach: [your idea]. Do not modifyany files outside of src/auth/. Run the tests after each change to catchregressions early. -
Break the task into smaller steps. Instead of one big task, give Codex three small ones.
Recovery Prompt Templates
Section titled “Recovery Prompt Templates”When This Breaks
Section titled “When This Breaks”Undo does not work: The undo feature relies on per-turn Git snapshots. If you disabled the undo feature flag, or the project is not a Git repository, undo is unavailable. Use git checkout -- . as a fallback.
Codex ignores your correction: Clear the context and start a new thread. In a long conversation, a correction in the middle may not override earlier (incorrect) instructions that carry more weight.
Recovery prompts get ignored: The model may be in a state where it keeps trying the same approach. Start a completely new session (new thread in the App, new codex invocation in the CLI) with a clean prompt that includes all the constraints upfront.
Cloud tasks cannot be stopped: Cloud tasks run asynchronously. If a cloud task is going sideways, you cannot interrupt it mid-execution. You can close the thread and start a new one. The cloud task may still finish and produce a PR, which you can simply close.
What’s Next
Section titled “What’s Next”You have completed the entire Codex quick-start section. You can install, authenticate, configure, write instructions, connect tools, integrate with GitHub, run tasks, review changes, and recover from failures. Here is where to go from here: