Million+ LOC Strategies

You have been asked to add a feature to a payment processing module. The codebase is 1.8 million lines across 12,000 files. You open Cursor, paste your requirements, and the AI hallucinated an import path that does not exist, referenced a deprecated internal API, and missed the three other services that need to change in lockstep. Context windows have limits — and large codebases exceed them by orders of magnitude.

What You’ll Walk Away With

Strategies for chunking large codebases into AI-digestible segments
Prompt patterns that give AI tools enough context without overwhelming them
Techniques for maintaining architectural coherence across multi-file changes
Workflows for safe, incremental modifications in mission-critical systems
Methods for building persistent context that survives across sessions

The Core Problem: Context Window vs. Codebase Size

Even with models supporting 200K+ token context windows, a million-line codebase cannot fit. The solution is not bigger context — it is smarter context selection.

The Context Pyramid

Think of context as a pyramid with four layers:

Architecture layer (always present): High-level documentation, dependency maps, module boundaries
Domain layer (task-specific): The subsystem you are working in, its interfaces, and contracts
Implementation layer (file-specific): The actual files you are modifying
Reference layer (on-demand): Examples of similar patterns elsewhere in the codebase

Cursor’s codebase indexing handles the architecture layer automatically. Enhance it with:

This is a large-scale payment processing platform.
Key modules:
- /src/payments/ - Payment processing (Stripe, PayPal, internal ledger)
- /src/accounts/ - User account management and KYC
- /src/notifications/ - Event-driven notification system
- /src/shared/ - Shared types, utilities, and base classes

When modifying any module, always check:
1. The module's public API in its index.ts barrel export
2. Integration tests in /tests/integration/{module-name}/
3. The event contracts in /src/shared/events/

Use @file and @folder mentions to pull specific context into conversations. For cross-cutting changes, reference the dependency graph: @src/shared/types/payment.ts before modifying any payment-related module.

Claude Code’s file system access makes it naturally suited for large codebases. Structure your CLAUDE.md hierarchy:

# /CLAUDE.md (root - architecture layer)
Monorepo with 1.8M LOC. Key architectural decisions:
- Event-driven architecture using RabbitMQ
- Each service owns its database schema
- Shared types live in /packages/shared-types/
- All inter-service communication goes through /packages/event-bus/

# /packages/payments/CLAUDE.md (domain layer)
Payment service handles Stripe and PayPal integrations.
Never modify PaymentProcessor directly - extend via strategy pattern.
All new payment methods must implement IPaymentStrategy interface.

Claude Code reads these automatically, building layered context as it navigates the codebase.

Codex cloud tasks operate in sandboxed environments with full repository access. For large codebases:

# codex.md or AGENTS.md
Large monorepo navigation rules:
- Always run `find . -name "*.ts" -path "*/payments/*" | head -20` to orient before modifying payment code
- Check /docs/architecture/ for system design documents before cross-service changes
- Use git log --oneline -20 on target files to understand recent change patterns

When making changes that span multiple packages:
1. List all affected packages first
2. Check each package's README for modification guidelines
3. Run the package's test suite after each change

Codex worktrees enable parallel exploration of different subsystems without conflicts.

Strategies for Navigating Large Codebases

Strategy 1: Dependency-First Exploration

Before modifying anything, map the blast radius of your change.

Copy-paste prompt for dependency mapping:

Before making any changes, I need you to map the dependency graph for the PaymentProcessor class:
1. Find where PaymentProcessor is defined
2. List every file that imports or references it
3. Identify which of those files are in the critical payment path vs. auxiliary (logging, analytics)
4. Check for any event listeners or message queue consumers that react to payment events
5. Show me the dependency tree as a simple text diagram

Do not modify any files yet. I need to understand the blast radius first.

Strategy 2: Architectural Summarization

Create living architecture documents that AI tools can reference.

Use Cursor’s agent mode to generate and maintain architecture summaries:

Scan the /src directory and create a concise architectural summary.
For each top-level module, document:
- Purpose (one sentence)
- Public API surface (exported functions/classes)
- Dependencies on other modules
- Database tables it owns
Save this to .cursor/architecture.md

Reference this file in future conversations with @.cursor/architecture.md.

Generate architecture docs that become part of the CLAUDE.md hierarchy:

claude "Analyze the entire /src directory structure and generate
an architecture summary. For each package in /packages/:
- What it does (one line)
- Its public exports
- Which other packages it depends on
- Its test coverage status
Save to /docs/architecture-summary.md"

Claude Code will read this file automatically in future sessions.

Codex cloud tasks can perform deep architectural analysis:

Analyze this repository's architecture. Create /docs/architecture-map.md with:
- Module dependency graph (text-based)
- Data flow diagram for the main user journeys
- List of shared interfaces and where they're implemented
- Database schema ownership by module
This will be used as a reference for future development tasks.

Strategy 3: Incremental Modification with Verification

Never attempt a large change in a single pass. Break it into verifiable steps.

Identify all files that need to change

Ask the AI to list every file affected before writing a single line of code. Review this list against your own understanding.
Modify shared interfaces first

Start with type definitions, interfaces, and contracts. These changes propagate errors that reveal hidden dependencies.
Update implementations one module at a time

Modify each consuming module independently. Run that module’s tests before moving to the next.
Run integration tests after each module

Do not wait until all modules are updated. Catch integration failures early.
Final cross-cutting verification

Run the full test suite, check for type errors across the entire codebase, and review the complete diff before committing.

Copy-paste prompt for safe incremental changes:

I need to add a `currency` field to the PaymentRequest type. This is a large codebase change.

Phase 1 - Analysis (do not modify files):
- Find the PaymentRequest type definition
- List every file that uses PaymentRequest
- Categorize files by: must change immediately vs. can use default value

Phase 2 - Interface change:
- Add currency: string to PaymentRequest with a default of "USD"
- Update the validation schema

Phase 3 - Critical path updates:
- Update PaymentProcessor to use the currency field
- Update the Stripe and PayPal adapters

After each phase, stop and let me verify before continuing.

Working with Legacy Code

Large codebases inevitably contain legacy code. AI tools can help navigate it, but you need specific strategies.

Pattern: The Rosetta Stone Approach

Find the most recent, well-written module that follows current conventions. Use it as the reference pattern for AI to follow when modifying legacy code.

Copy-paste prompt for legacy code modernization:

I need to add retry logic to the legacy OrderService (/src/legacy/OrderService.java).
Use /src/services/modern/PaymentService.java as the reference for our current patterns:
- Same error handling approach
- Same logging conventions
- Same retry strategy (exponential backoff with jitter)
- Same test structure

Do not rewrite OrderService entirely. Add the retry logic following modern conventions
while keeping the existing functionality intact. Minimize the diff.

When This Breaks

“The AI keeps hallucinating file paths and import names.” Your context layer is too thin. Add explicit file listings to your rules file, or ask the AI to search for the correct paths before generating code: “First, find where UserService is actually defined in this codebase, then write the import.”

“Changes work in isolation but break integration.” You skipped the dependency mapping step. Always map the blast radius before starting. Use the incremental modification strategy with verification between each phase.

“The AI suggests patterns that conflict with our architecture.” Your architecture layer documentation is missing or stale. Invest time in maintaining CLAUDE.md or .cursor/rules files that encode architectural decisions and constraints.

“Context window fills up before I finish my task.” Break the task into sub-tasks. Each sub-task should be completable within a single conversation. Use architecture summaries to quickly re-establish context in new sessions.

What’s Next

Monorepo Management AI workflows specific to monorepo architectures with cross-package dependencies.

Microservices with AI Coordinate AI-assisted changes across distributed service boundaries.

Context Management Deep dive into context window optimization and memory patterns.