Context Management Across Codex Surfaces
Your Codex session started sharp. After 20 turns of back-and-forth, the agent starts losing track of earlier decisions, repeating work, or forgetting constraints you established at the beginning. This is context window exhaustion — and it is the most common productivity killer for long-running Codex sessions. Understanding how context works across surfaces lets you keep the agent effective indefinitely.
What You’ll Walk Away With
Section titled “What You’ll Walk Away With”- A clear model of how Codex manages context across threads, surfaces, and sessions
- Techniques for structuring AGENTS.md to minimize context overhead
- Compaction strategies that preserve critical information when the window fills up
- Cross-surface context patterns for passing work between App, CLI, and Cloud
How Codex Context Works
Section titled “How Codex Context Works”Every message in a thread must fit within the model’s context window. The context includes:
- System instructions — Built-in Codex behavior
- AGENTS.md content — Your global and project-level guidance
- MCP tool definitions — Every configured MCP server adds tool schemas
- Skills metadata — Names and descriptions of available skills
- Conversation history — Every prompt, response, tool call, and result
- File contents — Files the agent has read during the session
Codex monitors remaining space and reports it. When the window gets tight, Codex automatically compacts the context by summarizing relevant information and dropping less relevant details.
Context Budget Planning
Section titled “Context Budget Planning”Think of your context window as a budget. Here is a rough allocation for a typical GPT-5.3-Codex session:
| Component | Approximate Tokens | Can You Control It? |
|---|---|---|
| System instructions | 2,000-3,000 | No |
| AGENTS.md (all levels) | 500-5,000 | Yes — keep it concise |
| MCP tool definitions | 500-3,000 per server | Yes — disable unused servers |
| Skills metadata | 200-500 | Yes — disable unused skills |
| Conversation history | Remainder | Yes — compaction and fresh threads |
AGENTS.md Context Optimization
Section titled “AGENTS.md Context Optimization”The Layering Strategy
Section titled “The Layering Strategy”Instead of one massive AGENTS.md, split by specificity:
~/.codex/AGENTS.md # 20 lines: universal preferencesrepo-root/AGENTS.md # 40 lines: project conventionsrepo-root/services/api/AGENTS.md # 30 lines: API-specific rulesWhen you work in services/api/, Codex loads ~90 lines of guidance. When you work in the root, it loads ~60. This keeps context proportional to the task scope.
The Override Pattern
Section titled “The Override Pattern”Use AGENTS.override.md for temporary context:
# TEMPORARY: Remove after the v2 migration is complete
- All new endpoints must use the v2 router in src/routes/v2/- Do NOT modify any v1 routes- Migration tracking doc: docs/v2-migration.mdWhen the migration is done, delete the override to restore normal guidance.
Compaction Strategies
Section titled “Compaction Strategies”When a thread gets long, Codex compacts automatically. You can also trigger it manually. Here is how to make compaction work well:
Front-Load Critical Context
Section titled “Front-Load Critical Context”Put the most important constraints at the beginning of your session. Compaction preserves recent and important information, but earlier turns are more likely to be summarized.
Use Fresh Threads Aggressively
Section titled “Use Fresh Threads Aggressively”Instead of one 50-turn thread, use five 10-turn threads:
- Thread 1: “Analyze the codebase and propose a migration plan”
- Thread 2: “Implement phase 1 of the migration plan: [paste summary from Thread 1]”
- Thread 3: “Implement phase 2: [paste summary]”
Each thread starts with a full context window.
Resume for Continuity
Section titled “Resume for Continuity”When you need Thread 2 to know what Thread 1 did, use session resumption:
codex resume --last "Now implement the changes you proposed"This carries over the entire transcript from the previous session.
Cross-Surface Context Flow
Section titled “Cross-Surface Context Flow”App to CLI
Section titled “App to CLI”The App and CLI share the same config, AGENTS.md, and skills. But they do not share thread history. To pass context between them:
- Use the integrated terminal in the App to run CLI commands
- Copy the relevant summary from an App thread into a CLI prompt
- Or use
codex resumeto continue an App thread from the CLI (if the session was logged)
Local to Cloud
Section titled “Local to Cloud”Cloud tasks do not have access to your local AGENTS.md or MCP servers. Instead, they use AGENTS.md committed to the repository. Make sure your repo-level AGENTS.md contains the critical guidance that cloud tasks need.
IDE to App
Section titled “IDE to App”When the IDE Extension and App are synced, they share thread visibility and auto-context (open files). This is the smoothest cross-surface flow — use it by default when both are available.
When This Breaks
Section titled “When This Breaks”- Agent forgets earlier instructions: The context was compacted and your instructions were summarized away. Repeat the critical constraint in your current prompt.
- Slow responses: A full context window means the model processes more tokens per turn. Start a fresh thread for the next phase of work.
- AGENTS.md not loading: Check that the file is not empty and that
project_doc_max_byteshas not been exceeded (default: 32KB). Runcodex --ask-for-approval never "Summarize the current instructions"to verify. - Cloud task ignores conventions: Ensure your AGENTS.md is committed to the repository, not just on your local machine.
What’s Next
Section titled “What’s Next”- Prompt Engineering — Write prompts that work within your context budget
- Efficiency Hacks — Quick tips for extending context life
- Multi-Agent Workflows — Splitting work across threads also splits context pressure