Performance Tuning & Optimization

You kick off a large refactor in a session that already has 40 files loaded and a half-hour of conversation behind it. Responses crawl, every turn re-reads the whole window, and your spend climbs while the actual edits stall. The model is not slow — your context is bloated, your model choice is wrong for the task, and nothing is scoped. Performance tuning in Claude Code is mostly about feeding the model exactly what a task needs and nothing more.

What You’ll Walk Away With

A repeatable way to keep the context window lean so responses stay fast and cheap
Concrete rules for picking between Haiku, Sonnet, and Opus per task — and how to switch mid-session
Copy-paste prompts for focused analysis, batch refactors, and guided compaction
Real telemetry: how to actually measure tokens, cost, and duration instead of guessing
A recovery playbook for when a session bogs down or hits the context limit

Why Context Is the Bottleneck

Every turn you send re-processes the entire active context window. A session carrying dozens of files, long tool output, and a sprawling conversation pays that cost on every single response. The fix is not a faster model — it is a smaller, sharper window.

On the Anthropic API, Fable 5, Sonnet 5, and Opus 5 carry 1M-token windows while Haiku 4.5 has 200K. Sonnet 5 needs no [1m] suffix there. In Claude Code, Opus 1M is included on Max, Team, and Enterprise but requires usage credits on Pro; gateways can budget Sonnet 5 at 200K until you select its 1M variant. A big window is a budget, not a target.

Keep CLAUDE.md Lean

Structure your CLAUDE.md files so each one carries only what the model needs at that level. A bloated root memory file is loaded into every session whether it is relevant or not.

# Root CLAUDE.md (keep it short)
## Critical Project Info Only
- Architecture: Microservices with Node.js
- Key commands: npm run dev, npm test
- Coding standards: ESLint + Prettier

# Frontend CLAUDE.md
## Frontend Specific
- Framework: React 18 with TypeScript
- State: Zustand stores in /src/stores
- Components: /src/components follows atomic design

Scope subtree-specific guidance to a CLAUDE.md inside that subtree. Claude Code loads memory hierarchically, so backend rules do not need to live in the root file the frontend also pays for.

Load Only the Directories a Task Needs

Scope the working set with --add-dir instead of letting a session pull in the whole tree.

Unfocused
Focused

# Pulls broad context, then searches everything
claude
> Analyze the entire codebase and find all TODO comments

# Restrict the working set to what matters
claude --add-dir src/auth src/middleware
> Explain the authentication flow, then list TODO comments in src/auth/ about JWT expiration

Copy-paste prompt for a focused, low-token analysis:

Scope: only the files under src/auth/ and src/middleware/auth.ts.
Do not read anything outside that scope.
Trace the authentication flow from request to verified user, list every file
involved, and flag any TODO or FIXME comments related to token expiration.
Return a short bullet summary, not a file-by-file dump.

The explicit “do not read outside that scope” line is what keeps the model from wandering into unrelated files and inflating the window.

Compaction and Clearing

Two commands control window size mid-session:

/clear wipes the conversation and loaded context entirely. Use it when switching to unrelated work.
/compact summarizes the conversation to reclaim space while keeping a distilled memory. Use it during a long session that is still on-topic.

/compact is lossy by design, so guide what it keeps.

Copy-paste prompt for guided compaction during a long session:

/compact Keep all error traces, failing test output, file paths I have edited, and
architecture decisions made so far. Drop intermediate explanations, abandoned
attempts, and anything purely conversational.

Compacting with explicit “keep” and “drop” lists preserves the diagnostic thread while shedding the filler that was slowing every turn.

Watch What Is Consuming the Window

Run /context to see usage. It renders a colored grid showing what is filling the window — system prompt, memory files, tools, conversation, and loaded files — so you can tell at a glance whether a few large files or a long conversation is the culprit.

claude> /context

If the grid shows files dominating, /clear and reload only what you need. If conversation dominates, /compact with guidance.

Model Selection

Match the model to the task. Over-spending on Opus for a typo wastes money and latency; under-spending on Haiku for an architecture decision wastes your time on a weaker answer.

Task	Recommended model	Why
Codebase-wide migrations, hardest debugging, long-running tasks	`fable` (Fable 5)	Highest capability; 1M context, 128K output; included on Max and Team Premium at 50% of weekly limits; usage credits on Pro and Team Standard
Typos, renames, formatting, mechanical edits	`haiku` (Haiku 4.5)	Fast and cheap; no deep reasoning needed
Feature implementation, routine bug fixes	`sonnet` (Sonnet 5)	Strong everyday default, 1M context
Architecture, large refactors, gnarly debugging	`opus` (Opus 5)	Excellent agentic reasoning and top SWE-Bench scores

You can set or switch the primary model four ways. Claude Code has failure-based fallback chains, but no task-aware router that chooses a model from your prompt:

Mid-session — /model sonnet (or fable, opus, haiku, opusplan, or a full name like claude-sonnet-5)
At launch — claude --model opus
Environment variable — export ANTHROPIC_MODEL=haiku
Settings — the model field in .claude/settings.json

{
  "model": "sonnet"
}

A practical pattern: start a session in Sonnet, and when you hit a genuinely hard problem, switch up.

# Mechanical cleanup — drop to the cheap model
/model haiku
Rename every occurrence of `getUserData` to `fetchUserProfile` across src/, including imports.

# Hard architectural call — switch up
/model opus
Evaluate whether to split the monolithic OrderService into separate Order, Payment, and
Fulfillment services. Lay out the trade-offs and a migration sequence before any code.

Thinking Budget

Current adaptive models use effort rather than a fixed token budget. Set low, medium, high, xhigh, or max with /effort, the /model slider, --effort, or CLAUDE_CODE_EFFORT_LEVEL. The default is high on Fable 5, Sonnet 5, and Opus 5, and xhigh on Opus 4.7.

Only Opus/Sonnet 4.6 can return to the old fixed budget. Opt out of adaptive thinking before setting a positive token cap:

# Enable 4.6 fixed-budget compatibility, then cap it
export CLAUDE_CODE_DISABLE_ADAPTIVE_THINKING=1
export MAX_THINKING_TOKENS=8000

# On the Anthropic API, disable thinking except on Fable 5
export MAX_THINKING_TOKENS=0

Workflow Patterns That Save Tokens

Batch Similar Operations

Group repetitive work into one pass instead of paying context overhead per item.

Enumerate the targets first

claude> List every React component under src/components that is missing prop types.

Run the batch with explicit guidance

claude> For each component you listed, add prop types inferred from actual usage in the file.
Process them in groups of five and report which files you changed.

Copy-paste prompt for a controlled batch refactor:

Find every file under src/api/ that calls the deprecated `db.queryRaw()` helper.
First list them. Then, working in batches of five, replace each call with the
parameterized `db.query()` equivalent, preserving behavior. After each batch,
run `npm test -- src/api` and stop if anything fails so we can review before continuing.

Batching plus a test gate after each group keeps the context window from ballooning and catches regressions before they compound across the whole change.

Checkpoint Before Big Changes

Git is your undo for AI-driven refactors. Branch and commit before letting the model loose, so a bad run is one git reset away.

git checkout -b ai-refactor-auth
git commit -am "Checkpoint before auth refactor"

# let Claude work, then if it goes sideways:
git reset --hard HEAD~1

Reuse Analysis Across Sessions

Have the model write findings to a file once, then reference that file instead of re-deriving the analysis (and re-loading the source) every session.

claude> Analyze all API endpoints and write the results to API_ANALYSIS.md.
# later, in a fresh session:
claude> Using API_ANALYSIS.md, list every endpoint missing authentication.

Measuring Performance for Real

Do not guess at token usage — measure it. Claude Code exposes real surfaces for this:

/usage — token usage and spend patterns inside the session (consolidates the older /cost and /stats)
/context — what is filling the window right now

For team-wide or longitudinal tracking, enable OpenTelemetry metrics export. This is the only sanctioned programmatic metrics surface — there is no per-operation tokens/duration API to script against.

# Emit Claude Code metrics (tokens, cost, session counts) over OTLP
export CLAUDE_CODE_ENABLE_TELEMETRY=1
export OTEL_METRICS_EXPORTER=otlp

Or pin it in settings so every session reports:

{
  "env": {
    "CLAUDE_CODE_ENABLE_TELEMETRY": "1",
    "OTEL_METRICS_EXPORTER": "otlp"
  }
}

Point the OTLP exporter at your metrics backend and you get cost-per-session and token-throughput dashboards from real data rather than invented benchmark tables.

When This Breaks

Symptom: long delays before each reply.

Recover:

Run /context to see what is filling the window.
/clear if loaded files dominate; reload only the directories you need with --add-dir.
/compact with a keep/drop list if the conversation is the bulk.
Drop to a faster model (/model haiku or sonnet) for mechanical work.

Symptom: answers get vaguer as the session ages.

Recover:

The window is probably saturated with stale context — /compact or start fresh.
Re-state the goal explicitly after compaction.
Split the task into smaller scoped steps.
Switch up to opus for the genuinely hard sub-problem.

Symptom: the window is full and turns fail or truncate.

Recover:

/compact aggressively with explicit keep/drop guidance.
Split the operation — do not try to refactor twelve modules in one session.
/clear between unrelated tasks instead of carrying everything forward.
Trim oversized CLAUDE.md files that load into every session.

Performance Checklist

☐ CLAUDE.md files kept short and scoped to their subtree
☐ Working set limited with --add-dir instead of loading the whole repo
☐ /clear between unrelated tasks, /compact (with guidance) within a long one
☐ Model matched to task: Haiku for mechanical, Sonnet for routine, Opus for hard
☐ Batch operations for repetitive edits, with a test gate between groups
☐ Checkpoint commits before large AI-driven changes
☐ Real measurement via /usage, /context, and OTel metrics export

What’s Next

Cost Optimization

Reduce spend while keeping the performance gains from this guide

CI/CD Integration

Apply these patterns in automated, headless pipelines

Team Scaling

Performance patterns for large teams sharing a codebase