Enterprise Cost Control
Your team’s Claude Code API bill tripled this month, finance wants a hard cap by Friday, and nobody can tell you which projects or people are driving the spend. The instinct is to build a homegrown gateway that meters every request — but Claude Code already ships the visibility and guardrails you need. This guide shows the real cost surfaces and how to wire them into budgets without killing developer velocity.
What You’ll Walk Away With
Section titled “What You’ll Walk Away With”- A per-session and per-team way to see token spend using
/cost,/stats,/context, and the status line — no custom telemetry code required - An OpenTelemetry export that feeds
claude_code.cost.usageandclaude_code.token.usageinto a dashboard, broken down by team and model - Hard budget guardrails for automation:
--max-budget-usd,--max-turns, and per-subagentmodel: haiku - A model-routing policy that reserves Opus 4.8 for the work that needs it and pushes everyday tasks to Sonnet 4.6 and Haiku 4.5
- Copy-paste prompts to audit a bloated session, stand up an OTel exporter, and add a context-budget hook
Where the Money Goes
Section titled “Where the Money Goes”Claude Code costs scale with context size and conversation length. Every message re-sends the system prompt, your CLAUDE.md, MCP tool definitions, and the accumulated conversation. Anthropic’s published baseline is around $6 per developer per day, with 90% of users under $12/day — but automation, long-running sessions, and oversized context blow past that fast.
The three biggest, most-controllable drivers:
- Stale context — leftover files and conversation from a previous task you never cleared
- MCP tool definitions — every connected server adds tool schemas to context even when idle
- Model choice — running Opus 4.8 for formatting and lint fixes that Haiku 4.5 handles for a fraction of the cost
Track Spend With Built-In Surfaces
Section titled “Track Spend With Built-In Surfaces”Start here before you restrict anything. You cannot manage what you cannot see, and Claude Code exposes usage at four levels.
-
Check the current session. Run
/costin the REPL to see token usage and API cost for this session (API users), or/statsfor usage patterns on Max/Pro:/costTotal cost: $0.55Total duration (API): 6m 19.7sTotal code changes: 0 lines added, 0 lines removed -
See what’s eating context. Run
/contextto break down exactly what is consuming your window — system prompt,CLAUDE.md, MCP tool definitions, and conversation history. This is how you find the MCP server you forgot to disable. -
Keep usage visible continuously. Configure the status line to display context window usage so you see token pressure on every message instead of discovering it at invoice time.
-
Roll up across the team. For API workspaces, set workspace spend limits and read cost and usage reporting in the Anthropic Console. The “Claude Code” workspace is created automatically on first authentication and centralizes org-wide tracking.
Org-Wide Dashboards With OpenTelemetry
Section titled “Org-Wide Dashboards With OpenTelemetry”/cost is per-session. For a finance-grade view across every developer, export Claude Code’s OpenTelemetry metrics to your existing observability stack. The two metrics that matter for cost are claude_code.cost.usage (USD) and claude_code.token.usage (tokens, with an input/output type attribute).
Enable it with environment variables — no code:
# Turn on telemetry and pick your exportersexport CLAUDE_CODE_ENABLE_TELEMETRY=1export OTEL_METRICS_EXPORTER=otlp # otlp, prometheus, or consoleexport OTEL_LOGS_EXPORTER=otlp
# Point at your OTLP collectorexport OTEL_EXPORTER_OTLP_PROTOCOL=grpcexport OTEL_EXPORTER_OTLP_ENDPOINT=http://collector.company.com:4317export OTEL_EXPORTER_OTLP_HEADERS="Authorization=Bearer ${OTEL_TOKEN}"To attribute spend to teams, tag every session with a resource attribute:
export OTEL_RESOURCE_ATTRIBUTES="department=payments,team=checkout"Now claude_code.cost.usage and claude_code.token.usage arrive in your backend sliced by department, team, model, and user.account_uuid, so a single Grafana panel answers “which team spent what, on which model.”
For org-wide rollout, push these settings through the managed settings file instead of asking every developer to set env vars:
{ "env": { "CLAUDE_CODE_ENABLE_TELEMETRY": "1", "OTEL_METRICS_EXPORTER": "otlp", "OTEL_EXPORTER_OTLP_PROTOCOL": "grpc", "OTEL_EXPORTER_OTLP_ENDPOINT": "http://collector.company.com:4317" }}Hard Guardrails for Automation
Section titled “Hard Guardrails for Automation”Interactive sessions self-correct — a developer notices a runaway loop and hits Escape. Headless and CI runs do not. For any non-interactive claude -p invocation, cap the blast radius directly with print-mode flags:
# Stop spending past $5 on this run, and never exceed 8 agentic turnsclaude -p "Triage failing tests in src/ and propose fixes" \ --max-budget-usd 5.00 \ --max-turns 8 \ --model sonnet--max-budget-usd stops the run once API spend crosses the limit; --max-turns exits with an error after N agentic turns so a misbehaving loop can’t burn your budget unattended. Both are print-mode (-p) only.
Right-Size the Model
Section titled “Right-Size the Model”The single highest-leverage cost lever is not metering — it is model choice. Sonnet 4.6 handles the bulk of coding tasks at roughly three-fifths of Opus 4.8’s price; Haiku 4.5 is cheaper still — about a fifth of Opus — for mechanical work.
| Model | Rough price (input / output per Mtok) | Reach for it when |
|---|---|---|
| Claude Fable 5 | ~$10 / ~$50 | Plan-mode planning and final verification on the hardest tasks (2x Opus) |
| Claude Opus 4.8 | ~$5 / ~$25 | Architecture, multi-step reasoning, gnarly debugging |
| Claude Sonnet 4.6 | ~$3 / ~$15 | Everyday coding, refactors, test writing (your default) |
| Claude Haiku 4.5 | ~$1 / ~$5 | Formatting, lint fixes, comments, high-volume mechanical edits |
The top row is new: Claude Fable 5 (released June 9, 2026) sits a tier above Opus 4.8 at exactly twice its price. On a budget, treat it as a bracket rather than a default — escalate to Fable 5 (/model fable) for Plan-mode planning up front and the final verification pass at the end, and run the implementation in between on Sonnet 4.6 or Opus 4.8. Teams where budget matters less than velocity and quality can instead set Fable 5 as the default, since subagents still auto-run on Opus, Sonnet, and Haiku — the bulk token volume stays on cheaper tiers while the main loop gets maximum intelligence.
Switch mid-session with /model, or set a default in /config. The biggest win is in subagents: delegate verbose work to a cheaper model so the expensive context stays small. Specify the model per subagent in its configuration:
---name: test-runnerdescription: Runs the test suite and returns only failurestools: [Bash, Read]model: haiku---You run the project's test command, then summarize only failing tests and theirerror messages. Never paste full passing output.Two more high-impact levers from the same playbook:
- Tune extended thinking. It is on by default with a generous budget. For simple tasks, lower the effort level in
/modelon a model that supports it (the effort slider appears for Opus), disable thinking in/config, or cap it withMAX_THINKING_TOKENS=8000— thinking tokens bill as output. - Cut MCP overhead. Run
/context, then/mcpto disable servers you are not using. Prefer CLI tools (gh,aws,gcloud,sentry-cli) over MCP servers where possible — they add no persistent tool definitions to context.
Trim Context Before It Trims Your Budget
Section titled “Trim Context Before It Trims Your Budget”Token cost is a direct function of context size. Claude Code auto-compacts near the limit and caches the system prompt, but the cheap wins are habits:
/clearbetween unrelated tasks. Stale context is re-billed on every subsequent message. Use/renamebefore clearing so you can/resumelater.- Guide compaction.
/compact Focus on the API changes and failing teststells Claude what to keep when it summarizes. - Offload to hooks. A
PreToolUsehook can grep a 10,000-line log down to the matching errors before Claude ever reads it, turning tens of thousands of tokens into hundreds. - Move workflow instructions out of
CLAUDE.mdinto skills.CLAUDE.mdloads at session start and is billed even on unrelated work; skills load on demand. KeepCLAUDE.mdunder ~500 lines.
When This Breaks
Section titled “When This Breaks”/costshows pennies but the bill is huge. You are on a subscription where/costis not your invoice, or spend is coming from CI keys that never run/costinteractively. Reconcile against Console workspace reporting and per-key usage, not session output.- OTel dashboard is empty on Bedrock/Vertex/Foundry. Claude Code does not emit cost metrics through cloud providers. Track via the provider’s billing or a LiteLLM gateway with per-key spend; see LLM Gateway.
--max-budget-usddidn’t stop a runaway job. It only applies to print mode and only enforces against API-key billing. For subscriptions or interactive sessions there is no dollar meter — rely on workspace spend limits and--max-turns.- Telemetry lags or floods your backend. Metrics export every 60s and logs every 5s by default. Tune
OTEL_METRIC_EXPORT_INTERVAL, and use theOTEL_METRICS_INCLUDE_*cardinality controls to keep storage costs down — a per-session-ID metric explosion can cost more than the Claude usage you are tracking. - A model downgrade tanks quality. Cost cuts that produce broken code are not savings. Keep Opus 4.8 available for the hard 10% and measure rework, not just token spend.
- Leaving Fable 5 as your default after June 22, 2026. From June 9 through June 22, 2026 it is included on Pro, Max, Team, and seat-based Enterprise plans at no extra cost — past that window it bills usage credits at 2x Opus rates. Scope it back to planning and final verification unless the extra spend is deliberate.
What’s Next
Section titled “What’s Next”- Monitoring Costs for the full OpenTelemetry metric and event catalog
- LLM Gateway to enforce hard budgets and track spend by key on Bedrock/Vertex
- Enterprise Integration for company-wide deployment and managed settings
Effective cost control is about balance: maximize value, eliminate waste. Get visibility first with /cost, /stats, and OTel; then apply model routing and guardrails. Most teams cut 30-50% without touching velocity — just by clearing context, right-sizing models, and turning off MCP servers they forgot were running.