Managing Large Codebases
You ask the agent to “rename the User.role field to User.accountType across the app.” It confidently edits three files in packages/web, declares victory, and you ship. Two hours later the billing-worker service throws in production because it read role from the same shared Drizzle schema the agent never opened. In a 500k-LOC monorepo, the model isn’t wrong because it’s dumb. It’s wrong because it never saw the file that mattered.
The fix is not a bigger context window. It’s a workflow: surface the right files before the model commits, index the repo so semantic search works, and decompose big changes so every step is reviewable.
What You’ll Walk Away With
Section titled “What You’ll Walk Away With”- A context-discovery prompt that maps the blast radius of a change before any code is written
- A decomposition prompt that turns “refactor the auth system” into a reviewable checklist
- Per-tool setup for indexing and ignore files in Cursor, Claude Code, and Codex
- A recovery playbook for stale indexes, context-window overflow, and monorepo scale limits
Step 1: Discover Context Before You Touch Code
Section titled “Step 1: Discover Context Before You Touch Code”The single highest-leverage move in a large repo is to make the agent find and report the affected files before it edits anything. This catches the cross-package dependency the agent would otherwise miss, and it’s cheap: a read-only pass costs a fraction of a botched refactor.
If the report misses a service you know exists, your index is incomplete or its ignore rules are too aggressive. Fix that (next step) before proceeding. The report is also your review artifact: keep it open and check off files as the change lands.
Step 2: Index the Repo and Tune Ignore Files
Section titled “Step 2: Index the Repo and Tune Ignore Files”Semantic indexing builds vector embeddings of your code so the agent can answer “where is payment processing handled?” without the file being open. In a large repo this is the difference between the agent reasoning about the whole system and the agent guessing from the three tabs you happened to have open. Each tool indexes differently, and the ignore files that keep the index fast and accurate are tool-specific.
Cursor indexes the workspace automatically and computes embeddings for semantic search. Keep the index lean with ignore files:
.cursorignoreblocks files from indexing and from agent access (use for secrets,node_modules/, build output)..cursorindexingignoreexcludes files from the index only — they stay reachable via explicit@-mention. Use it for large generated files (lockfiles,dist/, snapshots) that pollute search results.
dist/**/*.snappnpm-lock.yamlThen scope context with symbols, not whole files: @accountType (a symbol) gives a tighter, less noisy context than @user-service.ts (a 2,000-line file).
Claude Code doesn’t pre-index; it explores on demand with Grep/Glob and reads files as needed. Your job is to give it a durable map and watch the window:
- Run
/initonce to generate aCLAUDE.mdthat records the monorepo layout, package boundaries, and build/test commands. This is the context it can’t infer from a cold start. - Run
/contextto see what’s consuming the window before a big task — if shared schema and three services already fill it, you’re about to overflow. - For a sprawling change, delegate the discovery pass to a sub-agent (
.claude/agents/) so the file-by-file grep runs in an isolated context and only the summary returns to your main thread.
Codex reads an AGENTS.md file at the repo root (and per-package) as its persistent project context — document the workspace layout and the “always check the shared schema” rule there so it survives every new thread.
Inside the TUI, /init scaffolds an AGENTS.md from your codebase. For changes that touch many packages, run the work in a worktree (a separate checkout per thread) so a large refactor is isolated from your main working tree and easy to discard if the blast radius turns out bigger than expected.
Step 3: Decompose, Then Execute One Step at a Time
Section titled “Step 3: Decompose, Then Execute One Step at a Time”Never hand a large repo an open-ended task like “refactor the entire authentication system.” The model fans out, loses the thread halfway, and you get a 40-file diff you can’t review. Instead, make it emit a checklist, then execute one item per turn so each step is small enough to verify.
Once the plan is approved, drive it incrementally and verify after every step. Reference the exact symbol so the agent stays on-target instead of re-deriving scope:
After each step, run the relevant tests. This incremental loop makes a large migration reviewable and reversible, where a single “big bang” change is neither. For the full planning discipline, see PRD to Plan to Todo.
Step 4: Keep Context Fresh Between Tasks
Section titled “Step 4: Keep Context Fresh Between Tasks”In a long session the conversation history itself becomes stale context — the agent keeps “remembering” the bug you fixed an hour ago. Reset deliberately when you switch tasks. The mechanics differ per tool:
Start a new chat for each distinct task (new feature, new bug). Cursor keeps each chat’s context separate, so a fresh chat means the agent reasons only about the job in front of it. Use checkpoints to roll back a chat if an exploratory edit goes sideways.
Run /clear when moving between unrelated tasks to wipe the context window entirely. When you want to keep the thread but trim noise, use /compact <instructions> — e.g. /compact Focus on the JWT migration, drop the earlier CSS work. If you’ve corrected the model twice on the same issue, /clear and restart with a sharper prompt; a clean session almost always beats a long, cluttered one.
Use /new inside the TUI to start a fresh thread (Codex uses /new, not /clear). Because worktrees isolate each thread’s working directory, starting a new thread for a new task also keeps your file changes cleanly separated.
When This Breaks
Section titled “When This Breaks”- Stale index after a big merge. Right after a large branch merge or a generated-code regen, semantic search returns paths that moved or no longer exist. Re-trigger indexing (Cursor: it re-indexes on change, but force a resync from Settings if results look wrong; Claude Code/Codex: they read live, so the fix is usually a stale
CLAUDE.md/AGENTS.md— re-run/init). - Context-window overflow on huge files. A single 8,000-line generated client or migration file can blow the window on its own. Don’t
@-mention the whole file — mention the symbol, or ask the agent to grep for the relevant function and read only that range. Add the file to.cursorindexingignoreso it stops dominating search. - Monorepo indexing limits. Very large monorepos can exceed indexing caps or index so much that search precision drops. Scope aggressively with ignore files, keep generated output out of the index, and prefer per-package
AGENTS.md/CLAUDE.mdso context stays local to the package you’re working in. - Agent edits the wrong layer. If the model keeps editing a re-exported wrapper instead of the source, it lost the dependency direction. Re-run the Step 1 discovery prompt and explicitly name the source-of-truth file.
What’s Next
Section titled “What’s Next”- Codebase Indexing — how embeddings and semantic search actually work
- Context Windows — sizing context and avoiding overflow
- PRD to Plan to Todo — the planning discipline behind Step 3
- Memory Patterns — making
CLAUDE.md/AGENTS.mdcarry the load