Skip to content

Usage Monitoring and Cost Optimization

Your finance team wants to know how much Claude Code costs per developer per month. Your engineering manager wants to know which teams are getting the most value. Your security team wants audit logs. Without telemetry, you are guessing. With OpenTelemetry, you have dashboards that answer every question.

  • OpenTelemetry setup for metrics and event logging
  • The /cost command and status line for individual tracking
  • Team cost management with workspace limits and rate limiting
  • Token reduction strategies that cut costs without reducing effectiveness
  • A practical framework for measuring Claude Code ROI

Every developer can track their session costs in real-time:

/cost

Output:

Total cost: $0.55
Total duration (API): 6m 19.7s
Total duration (wall): 6h 33m 10.2s
Total code changes: 42 lines added, 18 lines removed

For continuous visibility, configure your status line to show token usage. See the status line documentation for configuration options.

Based on Anthropic’s published data:

MetricValue
Average cost per developer per day$6
90th percentile daily cost$12
Monthly average (Sonnet)$100-200/developer
Monthly average (Opus-heavy usage)$300-500/developer
Terminal window
# Enable telemetry
export CLAUDE_CODE_ENABLE_TELEMETRY=1
# Configure OTLP exporter
export OTEL_METRICS_EXPORTER=otlp
export OTEL_LOGS_EXPORTER=otlp
export OTEL_EXPORTER_OTLP_PROTOCOL=grpc
export OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4317
# Optional: authentication
export OTEL_EXPORTER_OTLP_HEADERS="Authorization=Bearer your-token"
# Start Claude Code
claude

Deploy via managed settings so every developer automatically reports telemetry:

{
"env": {
"CLAUDE_CODE_ENABLE_TELEMETRY": "1",
"OTEL_METRICS_EXPORTER": "otlp",
"OTEL_LOGS_EXPORTER": "otlp",
"OTEL_EXPORTER_OTLP_PROTOCOL": "grpc",
"OTEL_EXPORTER_OTLP_ENDPOINT": "http://collector.company.com:4317",
"OTEL_EXPORTER_OTLP_HEADERS": "Authorization=Bearer company-token"
}
}

All metric and event names carry the claude_code. namespace — use the full name when building dashboard queries or your filters will not match.

MetricTypeWhat It Tracks
claude_code.session.countCounterSessions started
claude_code.lines_of_code.countCounterLines added/removed by Claude
claude_code.pull_request.countCounterPRs created
claude_code.commit.countCounterCommits made
claude_code.cost.usageCounterDollar cost of API calls
claude_code.token.usageCounterInput and output tokens
claude_code.code_edit_tool.decisionCounterEdit tool allow/deny decisions
claude_code.active_time.totalCounterActive session time in seconds
EventWhat It Captures
claude_code.user_promptWhen prompts are submitted (content optional via OTEL_LOG_USER_PROMPTS=1)
claude_code.tool_resultTool call results and outcomes
claude_code.api_requestAPI call details (model, tokens, latency)
claude_code.api_errorAPI errors and rate limits
claude_code.tool_decisionPermission decisions for tool calls

For API users, set workspace-level spend limits in the Anthropic Console:

  1. Go to console.anthropic.com
  2. Navigate to your Claude Code workspace (auto-created on first authentication)
  3. Set monthly spend limits per workspace
Team SizeTPM per UserRPM per User
1-5200k-300k5-7
5-20100k-150k2.5-3.5
20-5050k-75k1.25-1.75
50-10025k-35k0.62-0.87
100-50015k-20k0.37-0.47

Per-user TPM decreases with team size because not all users are active concurrently.

Context size directly drives cost. Every message includes the full conversation history.

  • Clear between tasks: /clear when switching to unrelated work
  • Use targeted compaction: /compact Keep test output and code changes. Summarize discussion.
  • Add compaction instructions to CLAUDE.md:
    # Compact instructions
    When compacting, preserve test output, error traces, and file paths. Summarize discussion and reasoning.
TaskRecommended ModelWhy
Code reviewSonnetGood enough, significantly cheaper
Bug fixesSonnetMost bugs do not need Opus-level reasoning
Architecture decisionsOpusComplex multi-step reasoning benefits from Opus
Complex multi-file refactors, building from scratchFable 5Peak intelligence; use when budget matters less than velocity and quality
Simple file editsSonnet (or Haiku for subagents)Overkill to use Opus
Security auditsOpusNuanced analysis requires deeper reasoning

See model comparison for pricing details. Fable 5 costs $10/$50 per million tokens (input/output) — exactly 2× Opus 4.8.

Switch models mid-session with /model or set defaults in /config.

Each MCP server adds tool definitions to your context, consuming tokens even when idle:

  • Run /context to see what consumes space
  • Disable unused servers with /mcp
  • Prefer CLI tools (gh, aws, gcloud) over MCP servers when possible
  • Set ENABLE_TOOL_SEARCH=auto:5 to trigger MCP tool search when tool definitions exceed 5% of the context window (the default trigger is 10%). Deferred tools only enter context when actually used, so a lower threshold trims idle definitions

Subagents have their own context windows. Use them for:

  • Verbose operations (reading many files, running test suites)
  • Parallel tasks that would otherwise bloat the main context
  • Repetitive operations (applying the same change across multiple files)

Configure subagents with cheaper models. Use model: haiku for trivial subagents (mechanical edits, file scans) and model: sonnet for ones that need real reasoning:

---
model: haiku
---

A few additional knobs from the current cost guidance:

  • Install code-intelligence plugins for typed languages: they give Claude precise symbol navigation instead of grep-then-read-many-files, cutting exploratory token spend on TypeScript, Go, Rust, and similar codebases.
  • Move workflow-specific instructions from CLAUDE.md into skills: CLAUDE.md loads at session start, so detailed PR-review or migration instructions cost tokens even on unrelated work. Skills load on demand only when invoked. Aim to keep CLAUDE.md under ~500 lines.
  • Tune the extended-thinking budget: thinking tokens bill as output. For simpler tasks, lower the effort level in /model, disable thinking in /config, or cap the budget with MAX_THINKING_TOKENS (for example, MAX_THINKING_TOKENS=8000).

Telemetry data not appearing: Check that CLAUDE_CODE_ENABLE_TELEMETRY=1 is set. Verify the OTLP endpoint is reachable from developer machines. The default export interval is 60 seconds for metrics — wait at least that long before debugging.

Costs higher than expected: Check /context to see what is consuming space. Large MCP server configurations or bloated auto-memory files inflate every request. Also check for sessions that were never cleared — stale context accumulates.

Rate limits hit during high-usage periods: The per-user TPM guidelines assume average concurrency. During training sessions or onboarding events, temporarily increase limits or stagger usage.

Bedrock/Vertex costs not tracked: Claude Code does not send metrics from your cloud provider. Use LiteLLM or your cloud provider’s own cost tracking for Bedrock/Vertex billing.