Skip to content

Context Management Across Codex Surfaces

Your Codex session started sharp. After 20 turns of back-and-forth, the agent starts losing track of earlier decisions, repeating work, or forgetting constraints you established at the beginning. This is context window exhaustion — and it is the most common productivity killer for long-running Codex sessions. Understanding how context works across surfaces lets you keep the agent effective indefinitely.

  • A clear model of how Codex manages context across threads, surfaces, and sessions
  • Techniques for structuring AGENTS.md to minimize context overhead
  • Compaction strategies that preserve critical information when the window fills up
  • Cross-surface context patterns for passing work between App, CLI, and Cloud

Every message in a thread must fit within the model’s context window. The context includes:

  1. System instructions — Built-in Codex behavior
  2. AGENTS.md content — Your global and project-level guidance
  3. MCP tool definitions — Every configured MCP server adds tool schemas
  4. Skills metadata — Names and descriptions of available skills
  5. Conversation history — Every prompt, response, tool call, and result
  6. File contents — Files the agent has read during the session

Codex monitors remaining space and reports it. When the window gets tight, Codex automatically compacts the context by summarizing relevant information and dropping less relevant details.

Think of your context window as a budget. Here is a rough allocation for a typical GPT-5.3-Codex session:

ComponentApproximate TokensCan You Control It?
System instructions2,000-3,000No
AGENTS.md (all levels)500-5,000Yes — keep it concise
MCP tool definitions500-3,000 per serverYes — disable unused servers
Skills metadata200-500Yes — disable unused skills
Conversation historyRemainderYes — compaction and fresh threads

Instead of one massive AGENTS.md, split by specificity:

~/.codex/AGENTS.md # 20 lines: universal preferences
repo-root/AGENTS.md # 40 lines: project conventions
repo-root/services/api/AGENTS.md # 30 lines: API-specific rules

When you work in services/api/, Codex loads ~90 lines of guidance. When you work in the root, it loads ~60. This keeps context proportional to the task scope.

Use AGENTS.override.md for temporary context:

services/api/AGENTS.override.md
# TEMPORARY: Remove after the v2 migration is complete
- All new endpoints must use the v2 router in src/routes/v2/
- Do NOT modify any v1 routes
- Migration tracking doc: docs/v2-migration.md

When the migration is done, delete the override to restore normal guidance.

When a thread gets long, Codex compacts automatically. You can also trigger it manually. Here is how to make compaction work well:

Put the most important constraints at the beginning of your session. Compaction preserves recent and important information, but earlier turns are more likely to be summarized.

Instead of one 50-turn thread, use five 10-turn threads:

  1. Thread 1: “Analyze the codebase and propose a migration plan”
  2. Thread 2: “Implement phase 1 of the migration plan: [paste summary from Thread 1]”
  3. Thread 3: “Implement phase 2: [paste summary]”

Each thread starts with a full context window.

When you need Thread 2 to know what Thread 1 did, use session resumption:

Terminal window
codex resume --last "Now implement the changes you proposed"

This carries over the entire transcript from the previous session.

The App and CLI share the same config, AGENTS.md, and skills. But they do not share thread history. To pass context between them:

  • Use the integrated terminal in the App to run CLI commands
  • Copy the relevant summary from an App thread into a CLI prompt
  • Or use codex resume to continue an App thread from the CLI (if the session was logged)

Cloud tasks do not have access to your local AGENTS.md or MCP servers. Instead, they use AGENTS.md committed to the repository. Make sure your repo-level AGENTS.md contains the critical guidance that cloud tasks need.

When the IDE Extension and App are synced, they share thread visibility and auto-context (open files). This is the smoothest cross-surface flow — use it by default when both are available.

  • Agent forgets earlier instructions: The context was compacted and your instructions were summarized away. Repeat the critical constraint in your current prompt.
  • Slow responses: A full context window means the model processes more tokens per turn. Start a fresh thread for the next phase of work.
  • AGENTS.md not loading: Check that the file is not empty and that project_doc_max_bytes has not been exceeded (default: 32KB). Run codex --ask-for-approval never "Summarize the current instructions" to verify.
  • Cloud task ignores conventions: Ensure your AGENTS.md is committed to the repository, not just on your local machine.