Input Tokens
Every file, prompt, and context consumes input tokens. With Claude 3.5 Sonnet at $3 per million input tokens, a single large codebase scan can cost dollars.
Performance in Claude Code isn’t just about speed – it’s about maximizing value from every token, minimizing context pollution, and orchestrating workflows that scale efficiently. This guide reveals optimization strategies that transform Claude from a helpful but expensive assistant into a cost-effective development powerhouse.
Every interaction with Claude Code consumes tokens – the fundamental unit of AI computation. Understanding token economics is crucial for sustainable AI-assisted development:
Input Tokens
Every file, prompt, and context consumes input tokens. With Claude 3.5 Sonnet at $3 per million input tokens, a single large codebase scan can cost dollars.
Output Tokens
Every response, edit, and generation costs output tokens at $15 per million. Verbose explanations and large file generations add up quickly.
Context Window
200K token limit means strategic context management. You can’t just throw everything at Claude – you need to be selective.
Caching Benefits
90% discount on cached content transforms economics. Strategic caching can reduce costs by an order of magnitude.
Start minimal, expand as needed:
# BAD: Kitchen sink approachclaude "refactor all authentication" auth/ middleware/ utils/ tests/
# GOOD: Progressive expansionclaude "analyze auth patterns" auth/core.ts# Then: "now check middleware integration" middleware/auth.ts# Then: "update related tests" tests/auth/
Use .claudeignore
patterns effectively:
# .claudeignore - Exclude noise, keep signalnode_modules/dist/build/coverage/*.test.js # Include tests only when needed*.spec.ts # Ditto for specs*.min.js # Never include minified files*.map # Source maps waste tokenspackage-lock.jsonyarn.lockpnpm-lock.yaml
# Large generated files*.generated.ts*.pb.go*_gen.go
# Documentation (include only when relevant)docs/**/*.mdREADME.md # Keep root README
Reuse expensive context across multiple operations:
# Load context once, use multiple timesclaude-code --repl "analyze user service architecture" services/user/
# In REPL mode:> "identify performance bottlenecks"> "suggest caching strategies"> "generate optimization plan"> "implement top 3 optimizations"
Be Specific
Vague prompts lead to verbose responses. Specific prompts get specific answers.
Skip Explanations
When you know what you want, skip the tutorial: “implement X (no explanation needed)”
Use Templates
Reusable prompt templates reduce input tokens and improve consistency.
Request Formats
“Reply with code only” or “JSON response only” reduces output tokens significantly.
# INEFFICIENT: Verbose request"Could you please help me refactor this authenticationmiddleware to use the new JWT library? I'd like tounderstand the changes and make sure it's backwardscompatible..."
# EFFICIENT: Direct and specific"Refactor auth middleware: old-jwt → new-jwt library.Maintain API compatibility. Code only."
# INEFFICIENT: Story time"I'm getting this weird error when users log in onmobile devices. Sometimes it works, sometimes not.The error says something about tokens..."
# EFFICIENT: Structured report"Bug: Intermittent auth failure on mobileError: 'Invalid token format'Context: JWT validation in mobile-auth.ts:42Fix required. Show only the corrected function."
# INEFFICIENT: Open-ended"Implement a caching system for our API"
# EFFICIENT: Specific requirements"Implement Redis caching:- getThing() → check cache → miss? fetch & store- 5min TTL, LRU eviction- TypeScript, error handling includedCode only, no explanation"
Claude Code’s caching can reduce costs dramatically when used strategically:
Keep frequently-used context in the cache:
# First run: Full costclaude "analyze codebase structure" --include-types --include-interfaces
# Subsequent runs: 90% cheaper if context unchangedclaude "implement new user features" # Reuses cached type analysis
Structure operations to maximize cache hits:
Initial Analysis Pass
claude "analyze all services, identify patterns" services/
Implementation Passes (using cached analysis)
claude "add logging to user service based on patterns"claude "add logging to order service based on patterns"claude "add logging to payment service based on patterns"
Verification Pass (still using cache)
claude "verify logging consistency across all services"
Create cached templates for common operations:
# Cache the template contextclaude "analyze our API patterns" templates/api-template.ts
# Reuse for multiple endpoints (90% cheaper each)claude "create user endpoint following template"claude "create order endpoint following template"claude "create payment endpoint following template"
# INEFFICIENT: Multiple separate operationsclaude "update user model" models/user.tsclaude "update user service" services/user.tsclaude "update user controller" controllers/user.tsclaude "update user tests" tests/user.test.ts
# EFFICIENT: Single coordinated operationclaude "add email verification to user system" \ models/user.ts \ services/user.ts \ controllers/user.ts \ tests/user.test.ts \ --plan "1. Add emailVerified field 2. Add verification service 3. Add endpoints 4. Update tests"
Manage context window strategically:
// Custom context management scriptconst contextManager = { maxTokens: 150000, // Leave buffer currentTokens: 0,
addFile(path: string, priority: number) { const tokens = estimateTokens(readFile(path)); if (this.currentTokens + tokens > this.maxTokens) { this.evictLowPriority(); } this.files.push({ path, tokens, priority }); this.currentTokens += tokens; },
evictLowPriority() { // Remove lowest priority files until we have space this.files.sort((a, b) => b.priority - a.priority); while (this.currentTokens > this.maxTokens * 0.8) { const removed = this.files.pop(); this.currentTokens -= removed.tokens; } }};
Create a token tracking system:
#!/bin/bash# Wrap claude-code to track usageclaude_tracked() { local start_time=$(date +%s) local temp_log=$(mktemp)
# Run with logging claude "$@" 2>&1 | tee "$temp_log"
# Extract token usage (assumes claude-code outputs usage) local input_tokens=$(grep "Input tokens:" "$temp_log" | awk '{print $3}') local output_tokens=$(grep "Output tokens:" "$temp_log" | awk '{print $3}') local cached_tokens=$(grep "Cached tokens:" "$temp_log" | awk '{print $3}')
# Calculate costs local input_cost=$(echo "$input_tokens * 0.003 / 1000" | bc -l) local output_cost=$(echo "$output_tokens * 0.015 / 1000" | bc -l) local cached_cost=$(echo "$cached_tokens * 0.0003 / 1000" | bc -l) local total_cost=$(echo "$input_cost + $output_cost + $cached_cost" | bc -l)
# Log to tracking file echo "$(date +%Y-%m-%d\ %H:%M:%S),\"$*\",$input_tokens,$output_tokens,$cached_tokens,$total_cost" >> ~/.claude-usage.csv
# Clean up rm "$temp_log"}
alias claude-code="claude_tracked"
import pandas as pdimport matplotlib.pyplot as pltfrom datetime import datetime, timedelta
def analyze_usage(): df = pd.read_csv('~/.claude-usage.csv', names=[ 'timestamp', 'command', 'input_tokens', 'output_tokens', 'cached_tokens', 'cost' ])
df['timestamp'] = pd.to_datetime(df['timestamp']) df['date'] = df['timestamp'].dt.date
# Daily cost trend daily_cost = df.groupby('date')['cost'].sum()
# Token efficiency (cached vs fresh) token_efficiency = df.groupby('date').agg({ 'input_tokens': 'sum', 'cached_tokens': 'sum' }) token_efficiency['cache_rate'] = ( token_efficiency['cached_tokens'] / (token_efficiency['input_tokens'] + token_efficiency['cached_tokens']) )
# Most expensive operations expensive_ops = df.nlargest(10, 'cost')[['timestamp', 'command', 'cost']]
return daily_cost, token_efficiency, expensive_ops
Use the right model for the task:
# Complex refactoring: Use SonnetCLAUDE_MODEL=claude-3-5-sonnet-20241022 claude "refactor authentication system"
# Simple fixes: Could use Haiku (when available in CLI)CLAUDE_MODEL=claude-3-haiku claude "fix typo in README"
# Code review: Opus for deepest analysisCLAUDE_MODEL=claude-3-opus claude "security audit auth system"
Break large tasks into token-efficient chunks:
// Incremental migration scriptasync function migrateCodebase() { const files = await glob('src/**/*.js'); const batchSize = 10; // Files per batch
for (let i = 0; i < files.length; i += batchSize) { const batch = files.slice(i, i + batchSize);
// Process batch with focused context await claudeCode([ 'migrate to TypeScript', ...batch, '--context-limit', '50000', // Smaller context per batch '--cache-strategy', 'aggressive' ]);
// Clear cache between batches if needed if (i % 50 === 0) { await claudeCode(['--clear-cache']); } }}
“We reduced a 2-million token refactoring job to 400K tokens (80% reduction) using these techniques:”
Pre-Analysis Phase
Chunked Refactoring
Verification Pass
Optimized daily workflow reducing costs by 75%:
# Morning standup prep: 100K tokensclaude "summarize all yesterday's changes" .
# Feature development: 500K tokensclaude "implement user dashboard" src/
# Bug fixes: 300K tokensclaude "fix all linting errors" .
# Total: 900K tokens/day ($3.60)
# Morning standup prep: 20K tokensclaude "summarize changes" --since yesterday --format brief
# Feature development: 150K tokensclaude "implement user dashboard" \ src/components/Dashboard.tsx \ src/api/dashboard.ts \ --reuse-context
# Bug fixes: 50K tokensclaude "fix critical lints only" --severity error
# Total: 220K tokens/day ($0.88)
Daily Optimization Checklist
.claudeignore
for new generated filesPerformance optimization in Claude Code is about working smarter, not harder. By understanding token economics, leveraging caching, and structuring workflows efficiently, you can reduce costs by 80% or more while actually improving development velocity. The key is treating tokens as a valuable resource and optimizing their use just as you would optimize code performance.