Managing Large Codebases

You ask the agent to “rename the User.role field to User.accountType across the app.” It confidently edits three files in packages/web, declares victory, and you ship. Two hours later the billing-worker service throws in production because it read role from the same shared Drizzle schema the agent never opened. In a 500k-LOC monorepo, the model isn’t wrong because it’s dumb. It’s wrong because it never saw the file that mattered.

The fix is not a bigger context window. It’s a workflow: surface the right files before the model commits, index the repo so semantic search works, and decompose big changes so every step is reviewable.

What You’ll Walk Away With

A context-discovery prompt that maps the blast radius of a change before any code is written
A decomposition prompt that turns “refactor the auth system” into a reviewable checklist
Per-tool setup for indexing and ignore files in Cursor, Claude Code, and Codex
A recovery playbook for stale indexes, context-window overflow, and monorepo scale limits

Step 1: Discover Context Before You Touch Code

The single highest-leverage move in a large repo is to make the agent find and report the affected files before it edits anything. This catches the cross-package dependency the agent would otherwise miss, and it’s cheap: a read-only pass costs a fraction of a botched refactor.

Context-discovery prompt (paste before any large change):

We're renaming User.role to User.accountType. Before writing any code,
search the whole repo and produce a blast-radius report:
1. Every file that reads or writes `role` on a user object (path + line).
2. Which packages/services own them (group by package.json / workspace).
3. Any shared schema, types, or DB migrations involved.
4. The 5 files I should review first, ranked by risk.
Do NOT edit anything yet. Output the report as a markdown table.

If the report misses a service you know exists, your index is incomplete or its ignore rules are too aggressive. Fix that (next step) before proceeding. The report is also your review artifact: keep it open and check off files as the change lands.

Step 2: Index the Repo and Tune Ignore Files

Semantic indexing builds vector embeddings of your code so the agent can answer “where is payment processing handled?” without the file being open. In a large repo this is the difference between the agent reasoning about the whole system and the agent guessing from the three tabs you happened to have open. Each tool indexes differently, and the ignore files that keep the index fast and accurate are tool-specific.

Cursor indexes the workspace automatically and computes embeddings for semantic search. Keep the index lean with ignore files:

.cursorignore blocks files from indexing and from agent access (use for secrets, node_modules/, build output).
.cursorindexingignore excludes files from the index only — they stay reachable via explicit @-mention. Use it for large generated files (lockfiles, dist/, snapshots) that pollute search results.

dist/
**/*.snap
pnpm-lock.yaml

Then scope context with symbols, not whole files: @accountType (a symbol) gives a tighter, less noisy context than @user-service.ts (a 2,000-line file).

Claude Code doesn’t pre-index; it explores on demand with Grep/Glob and reads files as needed. Your job is to give it a durable map and watch the window:

Run /init once to generate a CLAUDE.md that records the monorepo layout, package boundaries, and build/test commands. This is the context it can’t infer from a cold start.
Run /context to see what’s consuming the window before a big task — if shared schema and three services already fill it, you’re about to overflow.
For a sprawling change, delegate the discovery pass to a sub-agent (.claude/agents/) so the file-by-file grep runs in an isolated context and only the summary returns to your main thread.

Codex reads an AGENTS.md file at the repo root (and per-package) as its persistent project context — document the workspace layout and the “always check the shared schema” rule there so it survives every new thread.

Inside the TUI, /init scaffolds an AGENTS.md from your codebase. For changes that touch many packages, run the work in a worktree (a separate checkout per thread) so a large refactor is isolated from your main working tree and easy to discard if the blast radius turns out bigger than expected.

Step 3: Decompose, Then Execute One Step at a Time

Never hand a large repo an open-ended task like “refactor the entire authentication system.” The model fans out, loses the thread halfway, and you get a 40-file diff you can’t review. Instead, make it emit a checklist, then execute one item per turn so each step is small enough to verify.

Decomposition prompt (emit a checklist, write no code):

We're migrating auth from session cookies to JWT across this Next.js
monorepo. Produce an ordered, dependency-aware checklist of every change,
one file or unit per line, grouped by phase (schema -> middleware ->
routes -> client -> tests). For each item note what must land before it.
Write NO code. I'll approve the plan, then we execute one item at a time.

Once the plan is approved, drive it incrementally and verify after every step. Reference the exact symbol so the agent stays on-target instead of re-deriving scope:

Per-step execution prompt (reference a specific symbol):

From the approved checklist, do item 3 only: update `requireAuth` in
packages/api/src/middleware/auth.ts to validate the JWT instead of the
session cookie. Keep the existing error shape. Show me the diff and the
test command to run before moving to item 4.

After each step, run the relevant tests. This incremental loop makes a large migration reviewable and reversible, where a single “big bang” change is neither. For the full planning discipline, see PRD to Plan to Todo.

Step 4: Keep Context Fresh Between Tasks

In a long session the conversation history itself becomes stale context — the agent keeps “remembering” the bug you fixed an hour ago. Reset deliberately when you switch tasks. The mechanics differ per tool:

Start a new chat for each distinct task (new feature, new bug). Cursor keeps each chat’s context separate, so a fresh chat means the agent reasons only about the job in front of it. Use checkpoints to roll back a chat if an exploratory edit goes sideways.

Run /clear when moving between unrelated tasks to wipe the context window entirely. When you want to keep the thread but trim noise, use /compact <instructions> — e.g. /compact Focus on the JWT migration, drop the earlier CSS work. If you’ve corrected the model twice on the same issue, /clear and restart with a sharper prompt; a clean session almost always beats a long, cluttered one.

Use /new inside the TUI to start a fresh thread (Codex uses /new, not /clear). A new thread resets conversational context but continues in the current checkout; create a separate git worktree when the new task also needs filesystem isolation.

When This Breaks

Stale index after a big merge. Right after a large branch merge or a generated-code regen, semantic search returns paths that moved or no longer exist. Re-trigger indexing (Cursor: it re-indexes on change, but force a resync from Settings if results look wrong; Claude Code/Codex: they read live, so the fix is usually a stale CLAUDE.md/AGENTS.md — re-run /init).
Context-window overflow on huge files. A single 8,000-line generated client or migration file can blow the window on its own. Don’t @-mention the whole file — mention the symbol, or ask the agent to grep for the relevant function and read only that range. Add the file to .cursorindexingignore so it stops dominating search.
Monorepo indexing limits. Very large monorepos can exceed indexing caps or index so much that search precision drops. Scope aggressively with ignore files, keep generated output out of the index, and prefer per-package AGENTS.md/CLAUDE.md so context stays local to the package you’re working in.
Agent edits the wrong layer. If the model keeps editing a re-exported wrapper instead of the source, it lost the dependency direction. Re-run the Step 1 discovery prompt and explicitly name the source-of-truth file.

What’s Next

Codebase Indexing — how embeddings and semantic search actually work
Context Windows — sizing context and avoiding overflow
PRD to Plan to Todo — the planning discipline behind Step 3
Memory Patterns — making CLAUDE.md/AGENTS.md carry the load