Million+ LOC Strategies

Navigating million-line codebases feels like exploring a vast city without a map. Every change ripples through countless dependencies, and understanding the full impact requires superhuman memory. This guide shows how AI coding assistants transform this complexity into manageable workflows.

The Scale Challenge

When your codebase crosses the million-line threshold, traditional development approaches break down. You’re dealing with:

Cognitive Overload: No single developer can hold the entire system architecture in their head
Hidden Coupling: Dependencies buried deep across module boundaries
Legacy Archaeology: Code written by developers who left years ago
Performance Bottlenecks: IDEs and tools that choke on the sheer volume
Context Fragmentation: Different teams with different conventions and patterns

The solution isn’t working harder—it’s leveraging AI assistants that can process and understand code at machine scale while you focus on architectural decisions.

Why AI Excels at Scale

Unlimited Working Memory

While humans struggle with 7±2 items in working memory, AI models can analyze hundreds of files simultaneously, tracking dependencies you’d never spot manually.

Semantic Understanding

AI doesn’t just grep for strings—it understands code intent, finding conceptually related functions across different naming conventions and implementations.

Pattern Detection at Scale

Identifies anti-patterns, duplicated logic, and optimization opportunities that would take months of manual code review to discover.

Fearless Refactoring

Make sweeping changes across thousands of files with confidence, as AI tracks all impacts and suggests necessary adjustments.

Essential MCP Servers for Large Codebases

Before diving into strategies, you need the right tools. These MCP servers transform how AI assistants understand and navigate massive codebases:

Semantic Code Search: Zilliz Code Context

When dealing with millions of lines, traditional text search fails. You need semantic understanding.

Installation for Claude Code:

claude mcp add code-context -e OPENAI_API_KEY=your-api-key -e MILVUS_TOKEN=your-zilliz-key -- npx @zilliz/code-context-mcp@latest

Installation for Cursor: Add to ~/.cursor/mcp.json:

{
  "mcpServers": {
    "code-context": {
      "command": "npx",
      "args": ["-y", "@zilliz/code-context-mcp@latest"],
      "env": {
        "EMBEDDING_PROVIDER": "OpenAI",
        "OPENAI_API_KEY": "your-api-key",
        "MILVUS_TOKEN": "your-zilliz-key"
      }
    }
  }
}

This server uses vector embeddings to understand code semantically. Ask “find all authentication flows” and it understands the concept across different implementations—whether it’s OAuth, JWT, or session-based.

High-Performance Pattern Search: Ripgrep MCP

For blazing-fast regex and pattern matching across massive codebases:

Installation for Claude Code:

claude mcp add-json "ripgrep" '{"command":"npx","args":["-y","mcp-ripgrep@latest"]}'

Sample Usage:

"Use ripgrep to find all TODO comments with high priority across the codebase"
"Search for all SQL queries that might be vulnerable to injection"
"Find all API endpoints that don't have rate limiting"

Local Code Indexing: Privacy-First Search

For sensitive enterprise codebases that can’t use cloud services:

Luoto Local Code Search uses ChromaDB for on-premise vector search:

# Configure environment
PROJECTS_ROOT=~/enterprise/code
FOLDERS_TO_INDEX=core-services,payment-engine,user-platform

# Add to Cursor
claude mcp add-json "workspace-code-search" '{"url":"http://localhost:8978/sse"}'

This keeps your code local while providing semantic search capabilities—crucial for financial services, healthcare, or defense contractors.

Architecture Discovery Workflows

You’ve inherited a 3-million line monolith. Where do you even start? Here’s a battle-tested approach:

Start with broad architectural understanding:

"Analyze this codebase and create a mental model of the system architecture.
Focus on:
1. Core business domains
2. Service boundaries
3. Data flow patterns
4. External dependencies

Present as a high-level overview suitable for a new senior engineer."

Then drill into specific areas:

"Using code-context search, find all payment processing flows.
I need to understand:
- Entry points for payment requests
- State management during processing
- Integration with external payment providers
- Error handling and retry logic"

Leverage Cursor’s codebase indexing:

@codebase "Explain the authentication and authorization architecture"

@folders services/auth "How does token refresh work in this system?"

@search_code "OAuth implementation"
"Show me all OAuth2 flows and explain the grant types supported"

Use the outline view for navigation:

Cmd/Ctrl+Shift+O
"payment"
# Quickly jump to payment-related code across files

Phase 2: Dependency Mapping

Understanding how modules interconnect is crucial for safe refactoring:

"Create a dependency graph for the UserService module:
1. What services does it depend on?
2. What services depend on it?
3. Are there any circular dependencies?
4. Which dependencies look problematic or tightly coupled?"

Follow up with specific investigations:

"The UserService depends on 47 other services.
Help me identify which dependencies are truly necessary
vs. which could be refactored to use events or interfaces."

Context Management at Scale

The biggest mistake developers make with large codebases? Trying to load everything at once. Your AI assistant doesn’t need to see all 3 million lines—it needs the right context at the right time.

The Context Hierarchy Strategy

Think of context like zooming on a map. Start with continent view, then country, then city, then street:

Domain Level (10,000 ft view)

"What are the main bounded contexts in this system?"
"How do the payment, user, and inventory domains interact?"

Service Level (1,000 ft view)

"Within the payment domain, explain the service architecture"
"What are the main APIs exposed by payment services?"

Component Level (100 ft view)

"Show me how PaymentProcessor handles credit card transactions"
"What's the retry strategy for failed payments?"

Implementation Level (ground level)

"In PaymentProcessor.processCard(), why is there a 30-second timeout?"
"Should we refactor this synchronized block?"

Smart Context Loading Patterns

Claude Code
Cursor

Pattern 1: Hierarchical CLAUDE.md Files

/CLAUDE.md                        # System-wide conventions
/services/CLAUDE.md               # Service layer patterns
/services/payment/CLAUDE.md       # Payment-specific rules
/services/payment/core/CLAUDE.md  # Core payment logic rules

Each level inherits from its parent, creating focused context:

# In /services/payment/CLAUDE.md
This service handles all payment processing.
Key principles:
- All amounts in cents to avoid floating point
- Idempotency keys required for all transactions
- PCI compliance: never log full card numbers

Common patterns in this service:
- Repository pattern for data access
- Command pattern for payment operations
- Event sourcing for transaction history

Pattern 2: Context Switching Commands

# Clear context between unrelated tasks
/clear

# Work on payment system
/add services/payment
"Analyze the payment processing flow"

# Switch to user system
/clear
/add services/users
"Review the authentication implementation"

Pattern 1: Scoped Context with @ Symbols

# Broad context for architecture questions
@folders services/payment
"Explain the payment architecture"

# Narrow context for specific changes
@PaymentProcessor.java @PaymentRepository.java
"Refactor to use async processing"

# Cross-cutting concerns
@search "implements PaymentGateway"
"List all payment gateway implementations"

Pattern 2: Project-Specific Rules

# In .cursorrules at project root
When working with payment code:
- Check for idempotency on all mutations
- Verify PCI compliance for card data
- Ensure all amounts use integer cents
- Add comprehensive audit logging

When analyzing performance:
- First check database queries
- Look for N+1 query patterns
- Verify connection pool settings
- Check for missing indexes

Context Optimization Techniques

The 80/20 Rule for Context
80% of your questions need only 20% of the codebase. Don’t pollute context with rarely-used utilities:

# Bad: Loading everything
"Load all code and find performance issues"

# Good: Targeted loading
"In the order processing pipeline (services/orders/pipeline/),
identify bottlenecks in Order.process() through Order.ship()"

Progressive Context Expansion
Start narrow and expand only when needed:

# Step 1: Understand the problem
@OrderService.java "Why are orders taking 5+ seconds to process?"

# Step 2: Expand to related services
@orders/ @inventory/ "Is the delay from inventory checks?"

# Step 3: Include infrastructure
@orders/ @database/migrations/ @config/
"Could this be a database indexing issue?"

Incremental Refactoring Workflows

Refactoring a million-line codebase is like renovating a hospital while surgery is ongoing. You can’t shut everything down for a rewrite. Here’s how to make systematic improvements safely:

Case Study: Migrating from Callbacks to Async/Await

Your Node.js codebase has 50,000 callback functions. Manual migration would take years. Here’s the AI-assisted approach:

Phase 1: Discovery and Planning

"Using ripgrep, find all callback patterns in the codebase.
Categorize them by:
1. Simple callbacks (single async operation)
2. Callback chains (multiple sequential operations)
3. Parallel callbacks (multiple concurrent operations)
4. Error-first callbacks with complex error handling
5. Callbacks with shared state or closures

Generate a migration complexity report."

Sample response:

Found 47,832 callback patterns:
- Simple callbacks: 31,245 (65%) - Low risk
- Callback chains: 9,823 (21%) - Medium risk
- Parallel callbacks: 4,234 (9%) - High risk
- Complex error handling: 1,830 (4%) - High risk
- Shared state callbacks: 700 (1%) - Very high risk

Recommended migration order: Start with simple callbacks
in utility functions, then move to service layer...

Phase 2: Creating Migration Patterns

"For each callback pattern type, create a safe migration template.
Include:
1. The transformation pattern
2. Edge cases to watch for
3. Testing strategy
4. Rollback plan"

The AI generates reusable patterns:

Simple Callback
Callback Chain

// Before
function loadUser(id, callback) {
  db.query('SELECT * FROM users WHERE id = ?', [id], (err, result) => {
    if (err) return callback(err);
    callback(null, result[0]);
  });
}

// After (with backward compatibility)
async function loadUser(id, callback) {
  // Support both callback and promise style
  if (callback) {
    try {
      const result = await db.query('SELECT * FROM users WHERE id = ?', [id]);
      callback(null, result[0]);
    } catch (err) {
      callback(err);
    }
  } else {
    const result = await db.query('SELECT * FROM users WHERE id = ?', [id]);
    return result[0];
  }
}

// Before
function processOrder(orderId, callback) {
  loadOrder(orderId, (err, order) => {
    if (err) return callback(err);
    validateInventory(order.items, (err, valid) => {
      if (err) return callback(err);
      if (!valid) return callback(new Error('Insufficient inventory'));
      chargePayment(order.paymentInfo, (err, charge) => {
        if (err) return callback(err);
        updateOrderStatus(orderId, 'completed', callback);
      });
    });
  });
}

// After
async function processOrder(orderId, callback) {
  try {
    const order = await loadOrder(orderId);
    const valid = await validateInventory(order.items);
    if (!valid) throw new Error('Insufficient inventory');
    const charge = await chargePayment(order.paymentInfo);
    const result = await updateOrderStatus(orderId, 'completed');

    if (callback) callback(null, result);
    else return result;
  } catch (err) {
    if (callback) callback(err);
    else throw err;
  }
}

Phase 3: Automated Migration

"Using the migration patterns, transform all simple callbacks
in the utils/ directory. For each file:
1. Apply the transformation
2. Preserve backward compatibility
3. Add deprecation comments
4. Update or create tests
5. Track migration status"

Parallel Refactoring Strategy

For massive refactoring efforts, coordinate multiple AI instances:

Partition the Codebase

"Analyze module dependencies and suggest how to partition
the codebase for parallel refactoring by 4 developers.
Minimize inter-team conflicts."

Create Feature Branches

git checkout -b refactor/team1-user-services
git checkout -b refactor/team2-payment-services
git checkout -b refactor/team3-inventory-services
git checkout -b refactor/team4-shared-utils

Synchronize Progress

"Review the changes in all refactor/* branches.
Identify potential conflicts or breaking changes
between teams' work."

Integration Testing

"Generate integration tests that verify the refactored
modules work correctly together. Focus on boundary
interactions between team territories."

Performance Optimization Workflows

In million-line codebases, performance problems hide in unexpected places. A innocent-looking function called in a tight loop can bring down your entire system. Here’s how to systematically hunt and fix performance issues:

The Performance Audit Workflow

Step 1: Algorithmic Complexity Analysis

"Using code analysis, find all potential O(n²) or worse algorithms.
For each one, determine:
1. How often it's called
2. Typical input size
3. Whether it's on a critical path
4. Suggested optimization approach"

Sample findings:

Found 47 potential quadratic algorithms:

CRITICAL - UserMatcher.findDuplicates()
- Called on every user registration
- Processes 2.3M users
- Current time: ~18 seconds
- Fix: Use hash-based approach, reducing to O(n)

HIGH - ReportGenerator.crossTabulate()
- Called in nightly batch jobs
- Processes 100K x 100K matrix
- Current time: ~4 hours
- Fix: Use sparse matrix representation

Database Performance Patterns

The #1 performance killer in large applications? Database queries. Here’s a systematic approach:

"Analyze the codebase for database performance anti-patterns:
1. N+1 queries in ORM usage
2. Missing indexes on foreign keys
3. Queries without pagination
4. Unnecessary eager loading
5. Queries inside loops

For each issue found, provide:
- Location in code
- Estimated performance impact
- Specific fix with code example"

N+1 Query Fix
Missing Index Fix

The AI identifies:

// Problem: N+1 queries in OrderService
async function getOrdersWithItems(userId) {
  const orders = await Order.findAll({ where: { userId } });

  // This creates N+1 queries!
  for (const order of orders) {
    order.items = await OrderItem.findAll({
      where: { orderId: order.id }
    });
  }
  return orders;
}

Suggested fix:

// Solution: Use eager loading
async function getOrdersWithItems(userId) {
  const orders = await Order.findAll({
    where: { userId },
    include: [{
      model: OrderItem,
      as: 'items'
    }]
  });
  return orders;
}

// Or use raw SQL for complex cases
async function getOrdersWithItemsOptimized(userId) {
  const query = `
    SELECT o.*,
           JSON_AGG(oi.*) as items
    FROM orders o
    LEFT JOIN order_items oi ON oi.order_id = o.id
    WHERE o.user_id = $1
    GROUP BY o.id
  `;
  return await db.query(query, [userId]);
}

-- AI identifies missing indexes
"Found 23 foreign keys without indexes.
Top impact queries:

1. OrderItems.order_id (4.2M rows)
   Used in: OrderService.getOrderDetails()
   Current execution time: 823ms
   With index: ~12ms

2. UserSessions.user_id (18M rows)
   Used in: AuthService.validateSession()
   Current execution time: 1.4s
   With index: ~8ms"

-- Generated migration
CREATE INDEX CONCURRENTLY idx_order_items_order_id
ON order_items(order_id);

CREATE INDEX CONCURRENTLY idx_user_sessions_user_id
ON user_sessions(user_id)
WHERE active = true; -- Partial index for active sessions

Memory Leak Detection

Memory leaks in large applications are insidious. They build up slowly until your servers start crashing:

"Search for common memory leak patterns:
1. Event listeners without cleanup
2. Closures holding large objects
3. Circular references
4. Growing caches without limits
5. Timers that never clear

Focus on long-running services and background workers."

The AI finds issues like:

// Memory leak in WebSocketManager
class WebSocketManager {
  constructor() {
    this.connections = new Map();
  }

  addConnection(userId, socket) {
    // LEAK: Old connections never removed!
    this.connections.set(userId, socket);

    socket.on('message', (data) => {
      // LEAK: Closure holds reference to entire manager
      this.handleMessage(userId, data);
    });
  }
}

// Fix with proper cleanup
class WebSocketManager {
  constructor() {
    this.connections = new Map();
  }

  addConnection(userId, socket) {
    // Remove any existing connection
    this.removeConnection(userId);

    const messageHandler = (data) => {
      this.handleMessage(userId, data);
    };

    socket.on('message', messageHandler);
    socket.on('close', () => this.removeConnection(userId));

    this.connections.set(userId, {
      socket,
      handlers: { message: messageHandler }
    });
  }

  removeConnection(userId) {
    const conn = this.connections.get(userId);
    if (conn) {
      conn.socket.removeAllListeners();
      this.connections.delete(userId);
    }
  }
}

Legacy Code Archaeology

Every large codebase has its archaeological layers—code written in different eras, by different teams, with different philosophies. Some of it predates version control. Here’s how to excavate and modernize safely:

Decoding the Ancient Texts

You’ve discovered a 15,000-line stored procedure that runs the entire billing system. No one understands it, but it processes $10M daily. Here’s the approach:

The Archaeological Method

"Analyze this stored procedure (billing_mega_proc.sql) and:
1. Create a high-level flow diagram
2. Extract the core business rules
3. Identify the main data transformations
4. Find any hidden side effects
5. Suggest how to decompose it into services

Treat this like reverse-engineering an alien artifact.
Be extremely careful about subtle business logic."

The AI responds with structured analysis:

Analysis of billing_mega_proc.sql:

CORE BUSINESS RULES DISCOVERED:
1. Tiered pricing with 47 special cases
2. Legacy customer discounts (some hardcoded by ID!)
3. Complex tax calculation with regional variations
4. Audit trail generation (writes to 6 different tables)

HIDDEN BEHAVIORS:
- Sends emails directly from the database (!)
- Updates inventory as side effect
- Creates accounting entries
- Has undocumented "debug mode" when user_id = 1

DECOMPOSITION STRATEGY:
1. Extract pricing engine (pure functions)
2. Separate tax calculator service
3. Move email sending to message queue
4. Create proper audit service

The Strangler Fig Approach

Rather than rewriting everything at once, gradually strangle the legacy system:

Phase 1: API Facade
Phase 2: Gradual Migration

"Create a modern API gateway that wraps the legacy billing system.
Requirements:
- RESTful endpoints for all billing operations
- Keep calling the legacy stored procedure internally
- Add proper error handling and logging
- Return consistent JSON responses
- Add OpenAPI documentation"

This gives you:

// Modern API wrapping legacy system
@Post('/billing/calculate')
async calculateBilling(@Body() request: BillingRequest) {
  try {
    // Still calls the monster stored proc
    const result = await this.db.callProcedure(
      'billing_mega_proc',
      request.customerId,
      request.items,
      request.date
    );

    // Transform cryptic response to modern format
    return {
      success: true,
      invoice: this.transformLegacyInvoice(result),
      timestamp: new Date().toISOString()
    };
  } catch (error) {
    // Legacy proc returns errors as magic numbers
    const errorMessage = this.decodeLegacyError(error);
    throw new HttpException(errorMessage, 400);
  }
}

"Now extract the pricing calculation from the stored procedure
into a modern service. The approach:
1. Create PricingService that matches legacy behavior exactly
2. Add comprehensive tests comparing outputs
3. Run both in parallel and log any differences
4. Once confident, switch over
5. Remove pricing logic from stored proc"

The AI helps create parallel-run infrastructure:

async calculatePricing(items: LineItem[]): Promise<Pricing> {
  // Run both implementations
  const [legacyResult, modernResult] = await Promise.all([
    this.runLegacyPricing(items),
    this.runModernPricing(items)
  ]);

  // Compare results
  if (!this.resultsMatch(legacyResult, modernResult)) {
    this.logger.warn('Pricing mismatch detected', {
      items,
      legacy: legacyResult,
      modern: modernResult,
      diff: this.calculateDiff(legacyResult, modernResult)
    });

    // Use legacy result but track discrepancy
    this.metrics.increment('pricing.mismatch');
  }

  // Return legacy result until fully validated
  return this.featureFlag('use-modern-pricing')
    ? modernResult
    : legacyResult;
}

Understanding Through Testing

When documentation doesn’t exist, tests become your documentation:

"This legacy module has no tests or documentation.
Create a comprehensive test suite that:
1. Documents current behavior through test names
2. Captures all edge cases you can find
3. Tests error conditions and their magic numbers
4. Verifies side effects

The tests should serve as living documentation."

The AI generates characterization tests:

describe('Legacy OrderProcessor - Current Behavior', () => {
  describe('Happy Path', () => {
    it('processes standard order with single item', async () => {
      // Documents current behavior
      const result = await processOrder({
        customerId: 123,
        items: [{ sku: 'WIDGET-1', quantity: 1 }]
      });

      expect(result.status).toBe(1); // Magic number: 1 = success
      expect(result.orderId).toMatch(/^ORD-\d{8}$/); // Format: ORD-YYYYMMDD
    });
  });

  describe('Edge Cases - Discovered Through Testing', () => {
    it('applies secret 50% discount for customer ID 777', async () => {
      // Undocumented behavior found through exploration!
      const result = await processOrder({
        customerId: 777, // Magic customer
        items: [{ sku: 'ANY-ITEM', quantity: 1, price: 100 }]
      });

      expect(result.totalPrice).toBe(50);
    });

    it('fails with error code -99 when inventory is negative', async () => {
      // More magic numbers documented
      const result = await processOrder({
        customerId: 123,
        items: [{ sku: 'OUT-OF-STOCK', quantity: 1 }]
      });

      expect(result.status).toBe(-99); // -99 = inventory error
    });
  });
});

Cross-Team Coordination

In million-line codebases, different teams own different territories. The challenge? Making changes that span multiple domains without stepping on each other’s toes.

Creating Team Interfaces

When Team A needs to understand Team B’s code:

The API Contract Generator

"Analyze the payment service (owned by Team FinTech) and generate:
1. OpenAPI specification for all endpoints
2. Event schemas for all published events
3. Database schemas for shared tables
4. Example requests/responses
5. Common error scenarios and handling

Format this for the mobile team who need to integrate."

This creates clear boundaries:

# Generated payment-service-api.yaml
openapi: 3.0.0
info:
  title: Payment Service API
  version: 2.3.0
  contact:
    team: FinTech
    slack: #team-fintech

paths:
  /payments/process:
    post:
      summary: Process a payment
      description: |
        Handles credit card, PayPal, and ACH payments.
        Idempotent using X-Idempotency-Key header.
      x-rate-limit: 100 requests per minute
      x-sla: 99.9% uptime, <500ms p99 latency

Dependency Impact Analysis

Before making breaking changes:

"I need to refactor the UserService.authenticate() method.
Analyze:
1. All services that call this method
2. The parameters they pass
3. How they handle the response
4. What errors they expect
5. Any indirect dependencies

Create a migration plan that won't break other teams."

The AI provides a detailed impact report:

Direct consumers of UserService.authenticate():
1. APIGateway - 1,247 calls
   - Uses: username/password auth
   - Expects: {userId, token, expiresAt}
   - Error handling: Catches AuthError only

2. MobileBackend - 892 calls
   - Uses: biometric token auth
   - Expects: {userId, token, refreshToken}
   - Error handling: Has retry logic

3. AdminPortal - 234 calls
   - Uses: SSO token auth
   - Expects: {userId, permissions[], token}
   - Error handling: Falls back to legacy auth

MIGRATION PLAN:
1. Add new method authenticateV2() with new signature
2. Update authenticate() to call V2 internally
3. Notify teams with 2-week migration window
4. Monitor usage, help teams migrate
5. Deprecate old method after 30 days

Automated Documentation Generation

Keep documentation in sync with code:

Architecture Diagrams
Living Documentation

"Generate a C4 model diagram showing:
- Context: How payment system fits in the ecosystem
- Container: Services within payment domain
- Component: Internal structure of payment-processor
- Code: Key classes and their relationships

Use PlantUML syntax so it's version controlled."

The AI generates:

@startuml Payment System - Container Diagram
!include https://raw.githubusercontent.com/plantuml-stdlib/C4-PlantUML/master/C4_Container.puml

Person(customer, "Customer", "Makes purchases")
System_Ext(bank, "Banking System", "Processes transactions")

System_Boundary(payment, "Payment Domain") {
  Container(api, "Payment API", "Node.js", "REST API for payment ops")
  Container(processor, "Payment Processor", "Java", "Core payment logic")
  Container(fraud, "Fraud Detection", "Python", "ML-based fraud checks")
  ContainerDb(db, "Payment DB", "PostgreSQL", "Transaction records")
  Container(events, "Event Bus", "Kafka", "Payment events")
}

Rel(customer, api, "Makes payment", "HTTPS")
Rel(api, processor, "Process payment", "gRPC")
Rel(processor, fraud, "Check fraud", "HTTP")
Rel(processor, bank, "Charge card", "ISO 8583")
Rel(processor, db, "Store transaction", "SQL")
Rel(processor, events, "Publish events", "Kafka")
@enduml

"Create a team handbook for the payment domain that:
1. Explains key business concepts
2. Documents architectural decisions
3. Lists common tasks with examples
4. Provides troubleshooting guides
5. Updates automatically from code comments"

Generated handbook structure:

# Payment Domain Handbook

## Business Concepts

### Payment States
Based on analysis of PaymentStateMachine.java:
- PENDING: Payment initiated but not processed
- PROCESSING: Currently being authorized
- AUTHORIZED: Funds reserved but not captured
- CAPTURED: Funds transferred
- FAILED: Payment rejected
- REFUNDED: Funds returned to customer

## Common Tasks

### Handling Failed Payments
From PaymentErrorHandler.java patterns:
1. Check failure reason in payment_failures table
2. Determine if retry is appropriate
3. If retriable, use exponential backoff
4. If not, trigger refund flow

## Troubleshooting

### "Payment stuck in PROCESSING"
Based on recent incident patterns:
1. Check payment_processor_logs for timeout
2. Verify external gateway status
3. Look for database locks
4. Consider manual state transition

Real-World Scenarios

Let’s tackle some actual challenges you’ll face in million-line codebases:

Scenario: Extracting a Tangled Service

Your notification logic is scattered across 47 different services. Time to extract it:

The Extraction Process

"Phase 1 - Discovery:
Using semantic search, find ALL notification-related code:
- Email sending
- SMS dispatch
- Push notifications
- In-app alerts
- Webhook deliveries

Include: scheduling, templating, retry logic, and preferences."

Followed by:

"Phase 2 - Untangling:
For each notification code fragment found:
1. What data does it need?
2. What services does it call?
3. What triggers it?
4. How does it handle failures?
5. What would break if we moved it?

Create a dependency graph."

Then systematically extract:

"Phase 3 - Service Design:
Design a notification service that:
1. Handles all current notification types
2. Provides a unified API
3. Maintains backward compatibility
4. Scales independently
5. Includes migration strategy

Generate the API specification and migration plan."

Scenario: Zero-Downtime Database Migration

Your user table has 500 million rows and needs sharding:

Analysis Phase
Implementation

"Analyze our user table access patterns:
1. What queries hit this table?
2. What's the read/write ratio?
3. How are JOINs performed?
4. What's the distribution of queries by user_id?
5. Are there any global queries?

Recommend sharding strategy based on actual usage."

AI discovers:

Analysis Results:
- 89% of queries filter by user_id
- 7% are analytics queries (full table scans)
- 4% are admin queries (by email, phone)

Recommended Strategy:
- Shard by user_id (hash-based, 64 shards)
- Replicate lookup tables for email/phone
- Move analytics to read replica
- Use routing service for shard location

"Generate the migration plan for zero-downtime sharding:
1. Code changes needed
2. Data migration strategy
3. Rollback procedures
4. Testing approach
5. Monitoring additions"

The implementation plan:

// Step 1: Dual-write phase
class UserRepository {
  async createUser(data) {
    // Write to both old and new
    const [oldResult, newResult] = await Promise.all([
      this.oldDb.insert('users', data),
      this.shardedDb.insert('users', data, this.getShardKey(data.id))
    ]);

    // Log any discrepancies
    if (!this.compareResults(oldResult, newResult)) {
      this.alerting.warn('Dual-write mismatch', { oldResult, newResult });
    }

    return oldResult; // Still return from old DB
  }

  getShardKey(userId) {
    return userId % 64; // Simple hash sharding
  }
}

Scenario: Performance Crisis

Production is slow. You have 2 hours to find and fix the issue:

"URGENT: Production performance degraded 70% in last hour.
Systematically investigate:
1. Recent deployments
2. Database slow queries
3. External service latencies
4. Resource exhaustion
5. Possible memory leaks

Focus on changes in the last 24 hours."

The AI provides a structured investigation:

Investigation Results:

1. RECENT DEPLOYMENTS (last 24h):
   - 14:30: Deployed order-service v2.3.1
   - 14:35: Performance degradation started
   - Correlation: HIGH

2. CODE CHANGES in v2.3.1:
   Found in OrderProcessor.java:
   - Added new fraud check calling external API
   - No timeout configured!
   - External API responding in 30-60 seconds

3. IMMEDIATE FIX:
   Add timeout to fraud check:
   ```java
   // In OrderProcessor.java line 234
   FraudCheckResult result = fraudClient
     .checkFraud(order)
     .timeout(Duration.ofMillis(500)) // Add this
     .onErrorReturn(FraudCheckResult.ALLOW); // Fail open

LONG-TERM FIX:
- Make fraud check async
- Add circuit breaker
- Cache fraud check results

## Continuous Improvement Workflows

### Weekly Codebase Health Checks

“Run weekly codebase health analysis:

New code smells introduced
Test coverage changes
Performance regression risks
Security vulnerabilities
Technical debt accumulation

Compare with last week and highlight trends.”

### Proactive Dependency Management

“Analyze all dependencies for:

Security vulnerabilities (CVEs)
Deprecated versions
Breaking changes in new versions
License compliance issues
Unmaintained packages

Create prioritized update plan with risk assessment.”

## Key Takeaways for Million-Line Success

<CardGrid>
  <Card title="Right Tool, Right Job" icon="wrench">
    Use semantic search (Zilliz) for understanding, ripgrep for patterns, local indexing for sensitive code. Don't try to load millions of lines into context.
  </Card>

  <Card title="Incremental Everything" icon="rocket">
    Never attempt big-bang refactoring. Use feature flags, dual writes, and gradual rollouts. Let AI help you plan safe, incremental changes.
  </Card>

  <Card title="Test as Documentation" icon="document">
    In legacy systems, comprehensive tests become your documentation. Use AI to generate characterization tests that capture current behavior.
  </Card>

  <Card title="Human + AI Partnership" icon="users">
    AI handles the mechanical work—finding patterns, generating boilerplate, tracking dependencies. Humans provide domain knowledge and architectural vision.
  </Card>
</CardGrid>

Working with million-line codebases doesn't have to be overwhelming. With the right AI tools and workflows, you can navigate, understand, and safely modify even the most complex systems. The key is thinking systematically and letting AI handle the scale while you focus on the strategy.