Skip to content

Managing Large Codebases

You ask the agent to “rename the User.role field to User.accountType across the app.” It confidently edits three files in packages/web, declares victory, and you ship. Two hours later the billing-worker service throws in production because it read role from the same shared Drizzle schema the agent never opened. In a 500k-LOC monorepo, the model isn’t wrong because it’s dumb. It’s wrong because it never saw the file that mattered.

The fix is not a bigger context window. It’s a workflow: surface the right files before the model commits, index the repo so semantic search works, and decompose big changes so every step is reviewable.

  • A context-discovery prompt that maps the blast radius of a change before any code is written
  • A decomposition prompt that turns “refactor the auth system” into a reviewable checklist
  • Per-tool setup for indexing and ignore files in Cursor, Claude Code, and Codex
  • A recovery playbook for stale indexes, context-window overflow, and monorepo scale limits

Step 1: Discover Context Before You Touch Code

Section titled “Step 1: Discover Context Before You Touch Code”

The single highest-leverage move in a large repo is to make the agent find and report the affected files before it edits anything. This catches the cross-package dependency the agent would otherwise miss, and it’s cheap: a read-only pass costs a fraction of a botched refactor.

If the report misses a service you know exists, your index is incomplete or its ignore rules are too aggressive. Fix that (next step) before proceeding. The report is also your review artifact: keep it open and check off files as the change lands.

Step 2: Index the Repo and Tune Ignore Files

Section titled “Step 2: Index the Repo and Tune Ignore Files”

Semantic indexing builds vector embeddings of your code so the agent can answer “where is payment processing handled?” without the file being open. In a large repo this is the difference between the agent reasoning about the whole system and the agent guessing from the three tabs you happened to have open. Each tool indexes differently, and the ignore files that keep the index fast and accurate are tool-specific.

Cursor indexes the workspace automatically and computes embeddings for semantic search. Keep the index lean with ignore files:

  • .cursorignore blocks files from indexing and from agent access (use for secrets, node_modules/, build output).
  • .cursorindexingignore excludes files from the index only — they stay reachable via explicit @-mention. Use it for large generated files (lockfiles, dist/, snapshots) that pollute search results.
.cursorindexingignore
dist/
**/*.snap
pnpm-lock.yaml

Then scope context with symbols, not whole files: @accountType (a symbol) gives a tighter, less noisy context than @user-service.ts (a 2,000-line file).

Step 3: Decompose, Then Execute One Step at a Time

Section titled “Step 3: Decompose, Then Execute One Step at a Time”

Never hand a large repo an open-ended task like “refactor the entire authentication system.” The model fans out, loses the thread halfway, and you get a 40-file diff you can’t review. Instead, make it emit a checklist, then execute one item per turn so each step is small enough to verify.

Once the plan is approved, drive it incrementally and verify after every step. Reference the exact symbol so the agent stays on-target instead of re-deriving scope:

After each step, run the relevant tests. This incremental loop makes a large migration reviewable and reversible, where a single “big bang” change is neither. For the full planning discipline, see PRD to Plan to Todo.

In a long session the conversation history itself becomes stale context — the agent keeps “remembering” the bug you fixed an hour ago. Reset deliberately when you switch tasks. The mechanics differ per tool:

Start a new chat for each distinct task (new feature, new bug). Cursor keeps each chat’s context separate, so a fresh chat means the agent reasons only about the job in front of it. Use checkpoints to roll back a chat if an exploratory edit goes sideways.

  • Stale index after a big merge. Right after a large branch merge or a generated-code regen, semantic search returns paths that moved or no longer exist. Re-trigger indexing (Cursor: it re-indexes on change, but force a resync from Settings if results look wrong; Claude Code/Codex: they read live, so the fix is usually a stale CLAUDE.md/AGENTS.md — re-run /init).
  • Context-window overflow on huge files. A single 8,000-line generated client or migration file can blow the window on its own. Don’t @-mention the whole file — mention the symbol, or ask the agent to grep for the relevant function and read only that range. Add the file to .cursorindexingignore so it stops dominating search.
  • Monorepo indexing limits. Very large monorepos can exceed indexing caps or index so much that search precision drops. Scope aggressively with ignore files, keep generated output out of the index, and prefer per-package AGENTS.md/CLAUDE.md so context stays local to the package you’re working in.
  • Agent edits the wrong layer. If the model keeps editing a re-exported wrapper instead of the source, it lost the dependency direction. Re-run the Step 1 discovery prompt and explicitly name the source-of-truth file.