Skip to content

Performance Optimization

Performance in Claude Code isn’t just about speed – it’s about maximizing value from every token, minimizing context pollution, and orchestrating workflows that scale efficiently. This guide reveals optimization strategies that transform Claude from a helpful but expensive assistant into a cost-effective development powerhouse.

Every interaction with Claude Code consumes tokens – the fundamental unit of AI computation. Understanding token economics is crucial for sustainable AI-assisted development:

Input Tokens

Every file, prompt, and context consumes input tokens. With Claude 3.5 Sonnet at $3 per million input tokens, a single large codebase scan can cost dollars.

Output Tokens

Every response, edit, and generation costs output tokens at $15 per million. Verbose explanations and large file generations add up quickly.

Context Window

200K token limit means strategic context management. You can’t just throw everything at Claude – you need to be selective.

Caching Benefits

90% discount on cached content transforms economics. Strategic caching can reduce costs by an order of magnitude.

Start minimal, expand as needed:

Terminal window
# BAD: Kitchen sink approach
claude "refactor all authentication" auth/ middleware/ utils/ tests/
# GOOD: Progressive expansion
claude "analyze auth patterns" auth/core.ts
# Then: "now check middleware integration" middleware/auth.ts
# Then: "update related tests" tests/auth/

Use .claudeignore patterns effectively:

# .claudeignore - Exclude noise, keep signal
node_modules/
dist/
build/
coverage/
*.test.js # Include tests only when needed
*.spec.ts # Ditto for specs
*.min.js # Never include minified files
*.map # Source maps waste tokens
package-lock.json
yarn.lock
pnpm-lock.yaml
# Large generated files
*.generated.ts
*.pb.go
*_gen.go
# Documentation (include only when relevant)
docs/**/*.md
README.md # Keep root README

Reuse expensive context across multiple operations:

Terminal window
# Load context once, use multiple times
claude-code --repl "analyze user service architecture" services/user/
# In REPL mode:
> "identify performance bottlenecks"
> "suggest caching strategies"
> "generate optimization plan"
> "implement top 3 optimizations"

Be Specific

Vague prompts lead to verbose responses. Specific prompts get specific answers.

Skip Explanations

When you know what you want, skip the tutorial: “implement X (no explanation needed)”

Use Templates

Reusable prompt templates reduce input tokens and improve consistency.

Request Formats

“Reply with code only” or “JSON response only” reduces output tokens significantly.

# INEFFICIENT: Verbose request
"Could you please help me refactor this authentication
middleware to use the new JWT library? I'd like to
understand the changes and make sure it's backwards
compatible..."
# EFFICIENT: Direct and specific
"Refactor auth middleware: old-jwt → new-jwt library.
Maintain API compatibility. Code only."

Claude Code’s caching can reduce costs dramatically when used strategically:

Keep frequently-used context in the cache:

Terminal window
# First run: Full cost
claude "analyze codebase structure" --include-types --include-interfaces
# Subsequent runs: 90% cheaper if context unchanged
claude "implement new user features" # Reuses cached type analysis

Structure operations to maximize cache hits:

  1. Initial Analysis Pass

    Terminal window
    claude "analyze all services, identify patterns" services/
  2. Implementation Passes (using cached analysis)

    Terminal window
    claude "add logging to user service based on patterns"
    claude "add logging to order service based on patterns"
    claude "add logging to payment service based on patterns"
  3. Verification Pass (still using cache)

    Terminal window
    claude "verify logging consistency across all services"

Create cached templates for common operations:

Terminal window
# Cache the template context
claude "analyze our API patterns" templates/api-template.ts
# Reuse for multiple endpoints (90% cheaper each)
claude "create user endpoint following template"
claude "create order endpoint following template"
claude "create payment endpoint following template"
Terminal window
# INEFFICIENT: Multiple separate operations
claude "update user model" models/user.ts
claude "update user service" services/user.ts
claude "update user controller" controllers/user.ts
claude "update user tests" tests/user.test.ts
# EFFICIENT: Single coordinated operation
claude "add email verification to user system" \
models/user.ts \
services/user.ts \
controllers/user.ts \
tests/user.test.ts \
--plan "1. Add emailVerified field 2. Add verification service 3. Add endpoints 4. Update tests"

Manage context window strategically:

// Custom context management script
const contextManager = {
maxTokens: 150000, // Leave buffer
currentTokens: 0,
addFile(path: string, priority: number) {
const tokens = estimateTokens(readFile(path));
if (this.currentTokens + tokens > this.maxTokens) {
this.evictLowPriority();
}
this.files.push({ path, tokens, priority });
this.currentTokens += tokens;
},
evictLowPriority() {
// Remove lowest priority files until we have space
this.files.sort((a, b) => b.priority - a.priority);
while (this.currentTokens > this.maxTokens * 0.8) {
const removed = this.files.pop();
this.currentTokens -= removed.tokens;
}
}
};

Create a token tracking system:

token-tracker.sh
#!/bin/bash
# Wrap claude-code to track usage
claude_tracked() {
local start_time=$(date +%s)
local temp_log=$(mktemp)
# Run with logging
claude "$@" 2>&1 | tee "$temp_log"
# Extract token usage (assumes claude-code outputs usage)
local input_tokens=$(grep "Input tokens:" "$temp_log" | awk '{print $3}')
local output_tokens=$(grep "Output tokens:" "$temp_log" | awk '{print $3}')
local cached_tokens=$(grep "Cached tokens:" "$temp_log" | awk '{print $3}')
# Calculate costs
local input_cost=$(echo "$input_tokens * 0.003 / 1000" | bc -l)
local output_cost=$(echo "$output_tokens * 0.015 / 1000" | bc -l)
local cached_cost=$(echo "$cached_tokens * 0.0003 / 1000" | bc -l)
local total_cost=$(echo "$input_cost + $output_cost + $cached_cost" | bc -l)
# Log to tracking file
echo "$(date +%Y-%m-%d\ %H:%M:%S),\"$*\",$input_tokens,$output_tokens,$cached_tokens,$total_cost" >> ~/.claude-usage.csv
# Clean up
rm "$temp_log"
}
alias claude-code="claude_tracked"
analyze-claude-usage.py
import pandas as pd
import matplotlib.pyplot as plt
from datetime import datetime, timedelta
def analyze_usage():
df = pd.read_csv('~/.claude-usage.csv', names=[
'timestamp', 'command', 'input_tokens',
'output_tokens', 'cached_tokens', 'cost'
])
df['timestamp'] = pd.to_datetime(df['timestamp'])
df['date'] = df['timestamp'].dt.date
# Daily cost trend
daily_cost = df.groupby('date')['cost'].sum()
# Token efficiency (cached vs fresh)
token_efficiency = df.groupby('date').agg({
'input_tokens': 'sum',
'cached_tokens': 'sum'
})
token_efficiency['cache_rate'] = (
token_efficiency['cached_tokens'] /
(token_efficiency['input_tokens'] + token_efficiency['cached_tokens'])
)
# Most expensive operations
expensive_ops = df.nlargest(10, 'cost')[['timestamp', 'command', 'cost']]
return daily_cost, token_efficiency, expensive_ops

Use the right model for the task:

Terminal window
# Complex refactoring: Use Sonnet
CLAUDE_MODEL=claude-3-5-sonnet-20241022 claude "refactor authentication system"
# Simple fixes: Could use Haiku (when available in CLI)
CLAUDE_MODEL=claude-3-haiku claude "fix typo in README"
# Code review: Opus for deepest analysis
CLAUDE_MODEL=claude-3-opus claude "security audit auth system"

Break large tasks into token-efficient chunks:

// Incremental migration script
async function migrateCodebase() {
const files = await glob('src/**/*.js');
const batchSize = 10; // Files per batch
for (let i = 0; i < files.length; i += batchSize) {
const batch = files.slice(i, i + batchSize);
// Process batch with focused context
await claudeCode([
'migrate to TypeScript',
...batch,
'--context-limit', '50000', // Smaller context per batch
'--cache-strategy', 'aggressive'
]);
// Clear cache between batches if needed
if (i % 50 === 0) {
await claudeCode(['--clear-cache']);
}
}
}

“We reduced a 2-million token refactoring job to 400K tokens (80% reduction) using these techniques:”

  1. Pre-Analysis Phase

    • Ran static analysis to identify actual dependencies
    • Created a dependency graph to optimize context inclusion
    • Cost: 50K tokens
  2. Chunked Refactoring

    • Split into 20 independent chunks based on dependency analysis
    • Each chunk used only relevant context
    • Cost: 300K tokens (vs 1.5M with full context)
  3. Verification Pass

    • Single pass with cached refactoring patterns
    • Verified consistency across all changes
    • Cost: 50K tokens (90% cached)

Optimized daily workflow reducing costs by 75%:

Terminal window
# Morning standup prep: 100K tokens
claude "summarize all yesterday's changes" .
# Feature development: 500K tokens
claude "implement user dashboard" src/
# Bug fixes: 300K tokens
claude "fix all linting errors" .
# Total: 900K tokens/day ($3.60)

Daily Optimization Checklist

  • Review token usage from previous day
  • Update .claudeignore for new generated files
  • Clear stale cache entries
  • Batch similar operations together
  • Use REPL mode for multi-step operations
  • Prefer specific file targeting over directory scanning
  • Request code-only responses when appropriate
  • Monitor cache hit rates
  • Review and optimize frequent prompts
  • Check for context window warnings

Performance optimization in Claude Code is about working smarter, not harder. By understanding token economics, leveraging caching, and structuring workflows efficiently, you can reduce costs by 80% or more while actually improving development velocity. The key is treating tokens as a valuable resource and optimizing their use just as you would optimize code performance.