Error Recovery

Codex just ran npm install and it failed because your project uses pnpm. It tried to import a module that does not exist. It went down a rabbit hole of increasingly wrong fixes for a type error. Or it simply stopped responding mid-task. These are not edge cases — they are normal parts of working with an AI agent, and how you recover determines whether Codex stays productive or wastes your afternoon.

What You’ll Walk Away With

A mental model for the five most common Codex failure categories
Recovery techniques for each failure type across all surfaces
The undo system and Git checkpoint strategy for safe rollbacks
Patterns for breaking out of error loops
Prompt templates for getting Codex back on track quickly

The Five Failure Categories

Every Codex failure falls into one of these buckets:

Category	What happened	Typical symptom
Sandbox/Permission	Codex tried to access something outside its allowed scope	”Permission denied,” “Sandboxed operation blocked”
Command Execution	A shell command failed	npm errors, test failures, compilation errors
Wrong Direction	Codex implemented the wrong approach or misunderstood the task	Code that compiles but does not match requirements
Context Overflow	The conversation grew too long and Codex lost track of earlier instructions	Repeating work, contradicting earlier decisions
Network/Connection	MCP server crashed, API timed out, or auth token expired	”Connection reset,” “Server not responding,” stalled progress

Sandbox and Permission Recovery

The Problem

Codex operates in a sandbox that restricts filesystem and network access. With workspace-write mode (the default), Codex can only write to your project directory. If your task requires writing elsewhere or making network calls, the sandbox blocks it.

Recovery

If the task legitimately needs broader access, adjust the sandbox in config.toml:

# Allow full filesystem and network access (use with caution)
sandbox_mode = "danger-full-access"

Or override for a single session:

codex --sandbox-mode danger-full-access

With on-request approval mode, Codex pauses before any restricted action. Review each request and approve or deny:

Approve once: Allow this specific action, ask again next time.
Approve for session: Allow this type of action for the rest of the session.

This gives you fine-grained control without opening the sandbox wide.

Sometimes the sandbox is correct and the approach is wrong. Ask Codex to find an alternative:

The sandbox blocked writing to /usr/local/. Instead of modifying system files,
create a local configuration in the project directory that achieves the same result.

Prompt to diagnose sandbox issues:

What permissions and sandbox restrictions are currently active? List what you can
and cannot do in terms of filesystem access and network calls.

This helps you understand what Codex sees before deciding whether to widen the sandbox.

Command Execution Recovery

The Problem

Codex runs a command and it fails. Maybe npm install failed because the project uses pnpm. Maybe python manage.py test failed because the database is not running. Maybe a compile error cascaded into test failures.

Recovery

Read the error output. Codex shows the full stderr. Often the error message tells you exactly what is wrong.
Let Codex try to fix it. In many cases, Codex automatically reads the error, diagnoses the cause, and suggests a fix. Let it iterate once before intervening.

Provide the missing context. If Codex guessed the wrong package manager or test runner, tell it explicitly:

This project uses pnpm, not npm. The test command is "pnpm vitest", not "npm test".
Update your approach and try again.

Update AGENTS.md. If this is a recurring mistake, add the correct commands to your AGENTS.md so Codex gets it right next time.

Prompt after a command failure:

The last command failed. Read the error output, identify the root cause, fix the
underlying issue (not just the symptom), and retry. If the fix requires changing
your approach, explain why before making changes.

Wrong-Direction Recovery

The Problem

This is the trickiest failure. Codex did not error out — the code compiles, maybe even passes tests — but it implemented the wrong thing. It used Express instead of Fastify. It added a REST endpoint when you wanted GraphQL. It restructured code in a way that works but does not match your architecture.

Recovery

The Codex App’s review pane lets you revert at any granularity:

Revert all: Discard everything and start over.
Revert file: Keep good files, discard bad ones.
Revert hunk: Keep good changes within a file, discard specific hunks.

After reverting, send a follow-up message with clearer constraints.

The CLI supports undo via per-turn Git ghost snapshots:

/undo

This rolls back to the state before the last turn. You can undo multiple turns.

Alternatively, use Git directly:

git checkout -- .

Sometimes you do not need to undo — you need to redirect. Tell Codex what went wrong and what you want instead:

Stop. You used Express but this project uses Fastify. Revert the Express-specific
changes and re-implement using Fastify's route handler pattern. Check AGENTS.md
for the project's framework before making changes.

The key to wrong-direction recovery is catching it early. Review the first few file edits before Codex finishes the entire task. If it is heading the wrong way, interrupt (Esc in the CLI, stop in the App) and redirect immediately rather than waiting for a completed but wrong implementation.

Prompt to prevent wrong direction before it happens:

Before writing any code, outline your implementation plan in 3-5 bullet points.
Include which framework, libraries, and patterns you will use. Wait for my
approval before implementing.

This forces Codex to check in before committing to an approach.

Context Overflow Recovery

The Problem

After a long conversation, Codex starts forgetting earlier instructions, repeating work it already did, or contradicting decisions from earlier in the thread. The context window has filled up and older messages are being compressed or dropped.

Recovery

Save state to AGENTS.md or a summary file:

Summarize everything we have accomplished in this session: decisions made,
files changed, patterns established. Save this summary to docs/session-state.md.

Start a new thread (App) or start a fresh conversation (CLI):
```
/new
```

Resume with the saved context:

Read docs/session-state.md for context from our previous session. Continue
from where we left off -- the next task is implementing the payment webhook handler.

Network and Connection Recovery

MCP Server Crashes

If an MCP server stops responding:

# Check server status
codex mcp list

# The server may need a restart -- close and reopen Codex
# Or disable the crashed server and continue without it

In the App, go to Settings > MCP and toggle the server off and back on.

API Timeouts

If Codex itself stops responding (the model API timed out):

CLI: Press Esc to cancel the current request, then resend your prompt.
App: Click the stop button on the thread, wait a moment, then send a follow-up.
IDE Extension: Cancel the request and try again.

If timeouts persist, check your network connection. Codex requires a stable internet connection to communicate with OpenAI’s API.

Auth Token Expiry

If you get authentication errors mid-session:

# CLI: Re-authenticate
codex login

# App: The App should prompt you to re-authenticate automatically

The Git Checkpoint Strategy

The most reliable safety net is frequent Git commits. Before any risky Codex task:

git add -A && git commit -m "checkpoint: before codex refactoring"

This gives you a clean rollback point. If Codex goes sideways, you can always:

git reset --hard HEAD

In the Codex App, use Worktree mode to isolate changes automatically. Worktrees create a separate Git checkout, so your main working directory stays clean regardless of what Codex does.

Prompt to have Codex create its own checkpoints:

Before making changes, create a Git commit with the message "checkpoint: pre-codex".
After completing the task, create another commit with a descriptive message. This way
I can easily diff or revert your changes.

Breaking Out of Error Loops

The most frustrating failure: Codex tries to fix an error, the fix creates a new error, it tries to fix that, which creates another error, and the cycle continues. Here is how to break out:

Stop the loop. Press Esc (CLI), click Stop (App), or cancel (IDE).
Undo all the loop iterations:
```
/undo
```
Or revert in the review pane. Get back to the state before the loop started.
Diagnose the root cause yourself. Read the original error that started the loop.

Give Codex explicit constraints:

The original error was [paste error]. You tried [approach] which created a
cascade of new errors. Instead, take this approach: [your idea]. Do not modify
any files outside of src/auth/. Run the tests after each change to catch
regressions early.

Break the task into smaller steps. Instead of one big task, give Codex three small ones.

Recovery Prompt Templates

After a command failure:

The command failed with: [paste error]. Do not retry the same command. Read the
error, identify the root cause, fix the underlying issue, then retry with the
corrected approach.

After wrong-direction implementation:

Stop and undo your last changes. You misunderstood the requirement. Here is what
I actually need: [clear requirement]. Before implementing, outline your new
approach and wait for my approval.

After context overflow symptoms:

We have been working for a while and I think you may have lost some context.
Read AGENTS.md and docs/session-state.md to refresh your understanding of this
project and our current progress. Summarize what you understand before continuing.

When This Breaks

Undo does not work: The undo feature relies on per-turn Git snapshots. If you disabled the undo feature flag, or the project is not a Git repository, undo is unavailable. Use git checkout -- . as a fallback.

Codex ignores your correction: Clear the context and start a new thread. In a long conversation, a correction in the middle may not override earlier (incorrect) instructions that carry more weight.

Recovery prompts get ignored: The model may be in a state where it keeps trying the same approach. Start a completely new session (new thread in the App, new codex invocation in the CLI) with a clean prompt that includes all the constraints upfront.

Cloud tasks cannot be stopped: Cloud tasks run asynchronously. If a cloud task is going sideways, you cannot interrupt it mid-execution. You can close the thread and start a new one. The cloud task may still finish and produce a PR, which you can simply close.

What’s Next

You have completed the entire Codex quick-start section. You can install, authenticate, configure, write instructions, connect tools, integrate with GitHub, run tasks, review changes, and recover from failures. Here is where to go from here:

Codex Overview Return to the Codex learning path for advanced topics

Shared Workflows Explore cross-tool workflows for testing, debugging, and deployment