Context Management Across Codex Surfaces

Your Codex session started sharp. After 20 turns of back-and-forth, the agent starts losing track of earlier decisions, repeating work, or forgetting constraints you established at the beginning. This is context window exhaustion — and it is the most common productivity killer for long-running Codex sessions. Understanding how context works across surfaces lets you keep the agent effective indefinitely.

What You’ll Walk Away With

A clear model of how Codex manages context across threads, surfaces, and sessions
Techniques for structuring AGENTS.md to minimize context overhead
Compaction strategies that preserve critical information when the window fills up
Cross-surface context patterns for passing work between App, CLI, and Cloud

How Codex Context Works

Every message in a thread must fit within the model’s context window. The context includes:

System instructions — Built-in Codex behavior
AGENTS.md content — Your global and project-level guidance
MCP tool definitions — Every configured MCP server adds tool schemas
Skills metadata — Names and descriptions of available skills
Conversation history — Every prompt, response, tool call, and result
File contents — Files the agent has read during the session

Codex monitors remaining space and reports it. When the window gets tight, Codex automatically compacts the context by summarizing relevant information and dropping less relevant details.

Context Budget Planning

Think of your context window as a budget. Here is a rough allocation for a typical GPT-5.3-Codex session:

Component	Approximate Tokens	Can You Control It?
System instructions	2,000-3,000	No
AGENTS.md (all levels)	500-5,000	Yes — keep it concise
MCP tool definitions	500-3,000 per server	Yes — disable unused servers
Skills metadata	200-500	Yes — disable unused skills
Conversation history	Remainder	Yes — compaction and fresh threads

AGENTS.md Context Optimization

The Layering Strategy

Instead of one massive AGENTS.md, split by specificity:

~/.codex/AGENTS.md                    # 20 lines: universal preferences
repo-root/AGENTS.md                   # 40 lines: project conventions
repo-root/services/api/AGENTS.md      # 30 lines: API-specific rules

When you work in services/api/, Codex loads ~90 lines of guidance. When you work in the root, it loads ~60. This keeps context proportional to the task scope.

The Override Pattern

Use AGENTS.override.md for temporary context:

# TEMPORARY: Remove after the v2 migration is complete

- All new endpoints must use the v2 router in src/routes/v2/
- Do NOT modify any v1 routes
- Migration tracking doc: docs/v2-migration.md

When the migration is done, delete the override to restore normal guidance.

Compaction Strategies

When a thread gets long, Codex compacts automatically. You can also trigger it manually. Here is how to make compaction work well:

Front-Load Critical Context

Put the most important constraints at the beginning of your session. Compaction preserves recent and important information, but earlier turns are more likely to be summarized.

Use Fresh Threads Aggressively

Instead of one 50-turn thread, use five 10-turn threads:

Thread 1: “Analyze the codebase and propose a migration plan”
Thread 2: “Implement phase 1 of the migration plan: [paste summary from Thread 1]”
Thread 3: “Implement phase 2: [paste summary]”

Each thread starts with a full context window.

Resume for Continuity

When you need Thread 2 to know what Thread 1 did, use session resumption:

codex resume --last "Now implement the changes you proposed"

This carries over the entire transcript from the previous session.

Cross-Surface Context Flow

App to CLI

The App and CLI share the same config, AGENTS.md, and skills. But they do not share thread history. To pass context between them:

Use the integrated terminal in the App to run CLI commands
Copy the relevant summary from an App thread into a CLI prompt
Or use codex resume to continue an App thread from the CLI (if the session was logged)

Local to Cloud

Cloud tasks do not have access to your local AGENTS.md or MCP servers. Instead, they use AGENTS.md committed to the repository. Make sure your repo-level AGENTS.md contains the critical guidance that cloud tasks need.

IDE to App

When the IDE Extension and App are synced, they share thread visibility and auto-context (open files). This is the smoothest cross-surface flow — use it by default when both are available.

When This Breaks

Agent forgets earlier instructions: The context was compacted and your instructions were summarized away. Repeat the critical constraint in your current prompt.
Slow responses: A full context window means the model processes more tokens per turn. Start a fresh thread for the next phase of work.
AGENTS.md not loading: Check that the file is not empty and that project_doc_max_bytes has not been exceeded (default: 32KB). Run codex --ask-for-approval never "Summarize the current instructions" to verify.
Cloud task ignores conventions: Ensure your AGENTS.md is committed to the repository, not just on your local machine.

What’s Next

Prompt Engineering — Write prompts that work within your context budget
Efficiency Hacks — Quick tips for extending context life
Multi-Agent Workflows — Splitting work across threads also splits context pressure