Skip to content

Million+ LOC Strategies

You have been asked to add a feature to a payment processing module. The codebase is 1.8 million lines across 12,000 files. You open Cursor, paste your requirements, and the AI hallucinated an import path that does not exist, referenced a deprecated internal API, and missed the three other services that need to change in lockstep. Context windows have limits — and large codebases exceed them by orders of magnitude.

  • Strategies for chunking large codebases into AI-digestible segments
  • Prompt patterns that give AI tools enough context without overwhelming them
  • Techniques for maintaining architectural coherence across multi-file changes
  • Workflows for safe, incremental modifications in mission-critical systems
  • Methods for building persistent context that survives across sessions

The Core Problem: Context Window vs. Codebase Size

Section titled “The Core Problem: Context Window vs. Codebase Size”

Even with models supporting 200K+ token context windows, a million-line codebase cannot fit. The solution is not bigger context — it is smarter context selection.

Think of context as a pyramid with four layers:

  1. Architecture layer (always present): High-level documentation, dependency maps, module boundaries
  2. Domain layer (task-specific): The subsystem you are working in, its interfaces, and contracts
  3. Implementation layer (file-specific): The actual files you are modifying
  4. Reference layer (on-demand): Examples of similar patterns elsewhere in the codebase

Cursor’s codebase indexing handles the architecture layer automatically. Enhance it with:

.cursor/rules
This is a large-scale payment processing platform.
Key modules:
- /src/payments/ - Payment processing (Stripe, PayPal, internal ledger)
- /src/accounts/ - User account management and KYC
- /src/notifications/ - Event-driven notification system
- /src/shared/ - Shared types, utilities, and base classes
When modifying any module, always check:
1. The module's public API in its index.ts barrel export
2. Integration tests in /tests/integration/{module-name}/
3. The event contracts in /src/shared/events/

Use @file and @folder mentions to pull specific context into conversations. For cross-cutting changes, reference the dependency graph: @src/shared/types/payment.ts before modifying any payment-related module.

Before modifying anything, map the blast radius of your change.

Create living architecture documents that AI tools can reference.

Use Cursor’s agent mode to generate and maintain architecture summaries:

Scan the /src directory and create a concise architectural summary.
For each top-level module, document:
- Purpose (one sentence)
- Public API surface (exported functions/classes)
- Dependencies on other modules
- Database tables it owns
Save this to .cursor/architecture.md

Reference this file in future conversations with @.cursor/architecture.md.

Strategy 3: Incremental Modification with Verification

Section titled “Strategy 3: Incremental Modification with Verification”

Never attempt a large change in a single pass. Break it into verifiable steps.

  1. Identify all files that need to change

    Ask the AI to list every file affected before writing a single line of code. Review this list against your own understanding.

  2. Modify shared interfaces first

    Start with type definitions, interfaces, and contracts. These changes propagate errors that reveal hidden dependencies.

  3. Update implementations one module at a time

    Modify each consuming module independently. Run that module’s tests before moving to the next.

  4. Run integration tests after each module

    Do not wait until all modules are updated. Catch integration failures early.

  5. Final cross-cutting verification

    Run the full test suite, check for type errors across the entire codebase, and review the complete diff before committing.

Large codebases inevitably contain legacy code. AI tools can help navigate it, but you need specific strategies.

Find the most recent, well-written module that follows current conventions. Use it as the reference pattern for AI to follow when modifying legacy code.

“The AI keeps hallucinating file paths and import names.” Your context layer is too thin. Add explicit file listings to your rules file, or ask the AI to search for the correct paths before generating code: “First, find where UserService is actually defined in this codebase, then write the import.”

“Changes work in isolation but break integration.” You skipped the dependency mapping step. Always map the blast radius before starting. Use the incremental modification strategy with verification between each phase.

“The AI suggests patterns that conflict with our architecture.” Your architecture layer documentation is missing or stale. Invest time in maintaining CLAUDE.md or .cursor/rules files that encode architectural decisions and constraints.

“Context window fills up before I finish my task.” Break the task into sub-tasks. Each sub-task should be completable within a single conversation. Use architecture summaries to quickly re-establish context in new sessions.