Enterprise Cost Control

Your team’s Claude Code API bill tripled this month, finance wants a hard cap by Friday, and nobody can tell you which projects or people are driving the spend. The instinct is to build a homegrown gateway that meters every request — but Claude Code already ships the visibility and guardrails you need. This guide shows the real cost surfaces and how to wire them into budgets without killing developer velocity.

What You’ll Walk Away With

A per-session and per-team way to see token spend using /cost, /stats, /context, and the status line — no custom telemetry code required
An OpenTelemetry export that feeds claude_code.cost.usage and claude_code.token.usage into a dashboard, broken down by team and model
Hard budget guardrails for automation: --max-budget-usd, --max-turns, and per-subagent model: haiku
A model-routing policy that reserves Opus 5 for the work that needs it and pushes everyday tasks to Sonnet 5 and Haiku 4.5
Copy-paste prompts to audit a bloated session, stand up an OTel exporter, and add a context-budget hook

Where the Money Goes

Claude Code costs scale with context size and conversation length. Every message re-sends the system prompt, your CLAUDE.md, MCP tool definitions, and the accumulated conversation. Anthropic’s published baseline is around $6 per developer per day, with 90% of users under $12/day — but automation, long-running sessions, and oversized context blow past that fast.

The three biggest, most-controllable drivers:

Stale context — leftover files and conversation from a previous task you never cleared
MCP tool definitions — every connected server adds tool schemas to context even when idle
Model choice — running Opus 5 for formatting and lint fixes that Haiku 4.5 handles for a fraction of the cost

Track Spend With Built-In Surfaces

Start here before you restrict anything. You cannot manage what you cannot see, and Claude Code exposes usage at four levels.

Check the current session. Run /cost in the REPL to see token usage and API cost for this session (API users), or /stats for usage patterns on Max/Pro:
```
/cost
Total cost:            $0.55
Total duration (API):  6m 19.7s
Total code changes:    0 lines added, 0 lines removed
```
See what’s eating context. Run /context to break down exactly what is consuming your window — system prompt, CLAUDE.md, MCP tool definitions, and conversation history. This is how you find the MCP server you forgot to disable.
Keep usage visible continuously. Configure the status line to display context window usage so you see token pressure on every message instead of discovering it at invoice time.
Roll up across the team. For API workspaces, set workspace spend limits and read cost and usage reporting in the Anthropic Console. The “Claude Code” workspace is created automatically on first authentication and centralizes org-wide tracking.

Org-Wide Dashboards With OpenTelemetry

/cost is per-session. For a finance-grade view across every developer, export Claude Code’s OpenTelemetry metrics to your existing observability stack. The two metrics that matter for cost are claude_code.cost.usage (USD) and claude_code.token.usage (tokens, with an input/output type attribute).

Enable it with environment variables — no code:

# Turn on telemetry and pick your exporters
export CLAUDE_CODE_ENABLE_TELEMETRY=1
export OTEL_METRICS_EXPORTER=otlp        # otlp, prometheus, or console
export OTEL_LOGS_EXPORTER=otlp

# Point at your OTLP collector
export OTEL_EXPORTER_OTLP_PROTOCOL=grpc
export OTEL_EXPORTER_OTLP_ENDPOINT=http://collector.company.com:4317
export OTEL_EXPORTER_OTLP_HEADERS="Authorization=Bearer ${OTEL_TOKEN}"

To attribute spend to teams, tag every session with a resource attribute:

export OTEL_RESOURCE_ATTRIBUTES="department=payments,team=checkout"

Now claude_code.cost.usage and claude_code.token.usage arrive in your backend sliced by department, team, model, and user.account_uuid, so a single Grafana panel answers “which team spent what, on which model.”

For org-wide rollout, push these settings through the managed settings file instead of asking every developer to set env vars:

{
  "env": {
    "CLAUDE_CODE_ENABLE_TELEMETRY": "1",
    "OTEL_METRICS_EXPORTER": "otlp",
    "OTEL_EXPORTER_OTLP_PROTOCOL": "grpc",
    "OTEL_EXPORTER_OTLP_ENDPOINT": "http://collector.company.com:4317"
  }
}

Copy-paste prompt to stand up the exporter and a budget alert:

Set up Claude Code cost telemetry for our team. In managed-settings.json, add an
env block that enables CLAUDE_CODE_ENABLE_TELEMETRY, exports metrics via OTLP/gRPC
to http://collector.company.com:4317, and tags sessions with department and team
via OTEL_RESOURCE_ATTRIBUTES. Then write a Prometheus/Grafana alert rule that fires
when the sum of claude_code.cost.usage over the last 24h, grouped by department,
exceeds a configurable threshold. Output the settings JSON and the alert YAML only.

Hard Guardrails for Automation

Interactive sessions self-correct — a developer notices a runaway loop and hits Escape. Headless and CI runs do not. For any non-interactive claude -p invocation, cap the blast radius directly with print-mode flags:

# Stop spending past $5 on this run, and never exceed 8 agentic turns
claude -p "Triage failing tests in src/ and propose fixes" \
  --max-budget-usd 5.00 \
  --max-turns 8 \
  --model sonnet

--max-budget-usd stops the run once API spend crosses the limit; --max-turns exits with an error after N agentic turns so a misbehaving loop can’t burn your budget unattended. Both are print-mode (-p) only.

Right-Size the Model

The single highest-leverage cost lever is not metering — it is model choice. During Sonnet 5’s launch pricing through August 31, it costs two-fifths of Opus 5 per token; after that it moves to three-fifths. Haiku 4.5 is cheaper still — about a fifth of Opus — for mechanical work.

Model	Rough price (input / output per Mtok)	Reach for it when
Claude Fable 5	~$10 / ~$50	Plan-mode planning and final verification on the hardest tasks (2x Opus)
Claude Opus 5	~$5 / ~$25	Architecture, multi-step reasoning, gnarly debugging
Claude Sonnet 5	$2 / $10 through Aug 31; then $3 / $15	Everyday coding, refactors, test writing
Claude Haiku 4.5	~$1 / ~$5	Formatting, lint fixes, comments, high-volume mechanical edits

The top row is new: Claude Fable 5 (released June 9, 2026) sits a tier above Opus 5 at exactly twice its price. On a budget, treat it as a bracket rather than a default — escalate to Fable 5 (/model fable) for planning and final verification, and run implementation on Sonnet 5 or Opus 5. If you set Fable as your default, explicitly pin cheaper subagent models; subagents otherwise inherit according to their configuration rather than universally auto-routing to cheaper tiers.

Switch mid-session with /model, or set a default in /config. The biggest win is in subagents: delegate verbose work to a cheaper model so the expensive context stays small. Specify the model per subagent in its configuration:

---
name: test-runner
description: Runs the test suite and returns only failures
tools: [Bash, Read]
model: haiku
---
You run the project's test command, then summarize only failing tests and their
error messages. Never paste full passing output.

Two more high-impact levers from the same playbook:

Tune adaptive reasoning. For simple tasks, lower the effort level with /effort or the /model slider. A positive MAX_THINKING_TOKENS cap works only after enabling fixed-budget mode on Opus/Sonnet 4.6; Fable 5 thinking cannot be disabled. Thinking tokens bill as output.
Cut MCP overhead. Run /context, then /mcp to disable servers you are not using. Prefer CLI tools (gh, aws, gcloud, sentry-cli) over MCP servers where possible — they add no persistent tool definitions to context.

Copy-paste prompt to write a downgrade policy your team can adopt:

Draft a short Claude Code model-selection policy for our team as a CLAUDE.md
section. Rules: default to Sonnet 5; escalate to Opus 5 only for architecture
or multi-step debugging and note why in the commit; escalate to Fable 5 (/model
fable) only for Plan-mode planning and the final verification pass — implementation
stays on Sonnet or Opus; route formatting, lint fixes, and comment generation to
Haiku 4.5; require every subagent definition to set an explicit model (haiku for
mechanical work). Add a note that teams where budget matters less than velocity
may default to Fable 5 instead, but must pin cheaper subagent models explicitly. Include
a one-line /model command for each case. Keep it under 15 lines.

Trim Context Before It Trims Your Budget

Token cost is a direct function of context size. Claude Code auto-compacts near the limit and caches the system prompt, but the cheap wins are habits:

/clear between unrelated tasks. Stale context is re-billed on every subsequent message. Use /rename before clearing so you can /resume later.
Guide compaction. /compact Focus on the API changes and failing tests tells Claude what to keep when it summarizes.
Offload to hooks. A PreToolUse hook can grep a 10,000-line log down to the matching errors before Claude ever reads it, turning tens of thousands of tokens into hundreds.
Move workflow instructions out of CLAUDE.md into skills. CLAUDE.md loads at session start and is billed even on unrelated work; skills load on demand. Keep CLAUDE.md under ~500 lines.

Copy-paste prompt to audit a session that has gotten expensive:

Run /context and /cost, then tell me exactly what is inflating this session's token
usage. Identify any MCP servers I'm not using right now, stale files from earlier
unrelated work, and conversation history that's safe to drop. Recommend whether to
/clear, /compact (with specific focus instructions), or disable an MCP server, and
give me the exact commands to run.

When This Breaks

/cost shows pennies but the bill is huge. You are on a subscription where /cost is not your invoice, or spend is coming from CI keys that never run /cost interactively. Reconcile against Console workspace reporting and per-key usage, not session output.
OTel dashboard is empty on Bedrock/Vertex/Foundry. Claude Code does not emit cost metrics through cloud providers. Track via the provider’s billing or a LiteLLM gateway with per-key spend; see LLM Gateway.
--max-budget-usd didn’t stop a runaway job. It only applies to print mode and only enforces against API-key billing. For subscriptions or interactive sessions there is no dollar meter — rely on workspace spend limits and --max-turns.
Telemetry lags or floods your backend. Metrics export every 60s and logs every 5s by default. Tune OTEL_METRIC_EXPORT_INTERVAL, and use the OTEL_METRICS_INCLUDE_* cardinality controls to keep storage costs down — a per-session-ID metric explosion can cost more than the Claude usage you are tracking.
A model downgrade tanks quality. Cost cuts that produce broken code are not savings. Keep Opus 5 available for the hard 10% and measure rework, not just token spend.
Leaving Fable 5 as your default without a usage-credit budget. Since July 20, 2026 it is permanently included on Max and Team Premium at up to 50% of weekly usage limits, but on Pro and Team Standard it bills usage credits at 2x Opus rates. Scope it back to planning and final verification unless the extra spend is deliberate.

What’s Next

Monitoring Costs for the full OpenTelemetry metric and event catalog
LLM Gateway to enforce hard budgets and track spend by key on Bedrock/Vertex
Enterprise Integration for company-wide deployment and managed settings

Effective cost control is about balance: maximize value, eliminate waste. Get visibility first with /cost, /stats, and OTel; then apply model routing and guardrails. Most teams cut 30-50% without touching velocity — just by clearing context, right-sizing models, and turning off MCP servers they forgot were running.