Usage Monitoring and Cost Optimization

Your finance team wants to know how much Claude Code costs per developer per month. Your engineering manager wants to know which teams are getting the most value. Your security team wants audit logs. Without telemetry, you are guessing. With OpenTelemetry, you have dashboards that answer every question.

What You Will Walk Away With

OpenTelemetry setup for metrics and event logging
The /cost command and status line for individual tracking
Team cost management with workspace limits and rate limiting
Token reduction strategies that cut costs without reducing effectiveness
A practical framework for measuring Claude Code ROI

Individual Cost Tracking

The /cost Command

Every developer can track their session costs in real-time:

/cost

Output:

Total cost:            $0.55
Total duration (API):  6m 19.7s
Total duration (wall): 6h 33m 10.2s
Total code changes:    42 lines added, 18 lines removed

For continuous visibility, configure your status line to show token usage. See the status line documentation for configuration options.

Typical Cost Ranges

Based on Anthropic’s published data:

Metric	Value
Average cost per developer per day	$6
90th percentile daily cost	$12
Monthly average (Sonnet)	$100-200/developer
Monthly average (Opus-heavy usage)	$300-500/developer

OpenTelemetry Setup

Quick Start

# Enable telemetry
export CLAUDE_CODE_ENABLE_TELEMETRY=1

# Configure OTLP exporter
export OTEL_METRICS_EXPORTER=otlp
export OTEL_LOGS_EXPORTER=otlp
export OTEL_EXPORTER_OTLP_PROTOCOL=grpc
export OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4317

# Optional: authentication
export OTEL_EXPORTER_OTLP_HEADERS="Authorization=Bearer your-token"

# Start Claude Code
claude

Organization-Wide Deployment

Deploy via managed settings so every developer automatically reports telemetry:

{
  "env": {
    "CLAUDE_CODE_ENABLE_TELEMETRY": "1",
    "OTEL_METRICS_EXPORTER": "otlp",
    "OTEL_LOGS_EXPORTER": "otlp",
    "OTEL_EXPORTER_OTLP_PROTOCOL": "grpc",
    "OTEL_EXPORTER_OTLP_ENDPOINT": "http://collector.company.com:4317",
    "OTEL_EXPORTER_OTLP_HEADERS": "Authorization=Bearer company-token"
  }
}

Copy-paste managed settings for full telemetry deployment:

Place in the managed settings directory for your platform:

{
  "env": {
    "CLAUDE_CODE_ENABLE_TELEMETRY": "1",
    "OTEL_METRICS_EXPORTER": "otlp",
    "OTEL_LOGS_EXPORTER": "otlp",
    "OTEL_EXPORTER_OTLP_ENDPOINT": "http://otel-collector.internal:4317"
  }
}

Available Metrics

All metric and event names carry the claude_code. namespace — use the full name when building dashboard queries or your filters will not match.

Metric	Type	What It Tracks
`claude_code.session.count`	Counter	Sessions started
`claude_code.lines_of_code.count`	Counter	Lines added/removed by Claude
`claude_code.pull_request.count`	Counter	PRs created
`claude_code.commit.count`	Counter	Commits made
`claude_code.cost.usage`	Counter	Dollar cost of API calls
`claude_code.token.usage`	Counter	Input and output tokens
`claude_code.code_edit_tool.decision`	Counter	Edit tool allow/deny decisions
`claude_code.active_time.total`	Counter	Active session time in seconds

Available Events

Event	What It Captures
`claude_code.user_prompt`	When prompts are submitted (content optional via `OTEL_LOG_USER_PROMPTS=1`)
`claude_code.tool_result`	Tool call results and outcomes
`claude_code.api_request`	API call details (model, tokens, latency)
`claude_code.api_error`	API errors and rate limits
`claude_code.tool_decision`	Permission decisions for tool calls

Team Cost Management

Workspace Spend Limits

For API users, set workspace-level spend limits in the Anthropic Console:

Go to console.anthropic.com
Navigate to your Claude Code workspace (auto-created on first authentication)
Set monthly spend limits per workspace

Rate Limit Guidelines

Team Size	TPM per User	RPM per User
1-5	200k-300k	5-7
5-20	100k-150k	2.5-3.5
20-50	50k-75k	1.25-1.75
50-100	25k-35k	0.62-0.87
100-500	15k-20k	0.37-0.47

Per-user TPM decreases with team size because not all users are active concurrently.

Token Reduction Strategies

Manage Context Proactively

Context size directly drives cost. Every message includes the full conversation history.

Clear between tasks: /clear when switching to unrelated work
Use targeted compaction: /compact Keep test output and code changes. Summarize discussion.

Add compaction instructions to CLAUDE.md:

# Compact instructions
When compacting, preserve test output, error traces, and file paths. Summarize discussion and reasoning.

Copy-paste prompt for context-aware cost management:

Before starting this task, check /cost. If we have used more than $2 in this session,
use /compact first to reduce context. Focus on the specific files involved -- do not
read entire directories when grep can find what we need.

Choose the Right Model

Task	Recommended Model	Why
Code review	Sonnet	Good enough, significantly cheaper
Bug fixes	Sonnet	Most bugs do not need Opus-level reasoning
Architecture decisions	Opus	Complex multi-step reasoning benefits from Opus
Complex multi-file refactors, building from scratch	Fable 5	Peak intelligence; use when budget matters less than velocity and quality
Simple file edits	Sonnet (or Haiku for subagents)	Overkill to use Opus
Security audits	Opus	Nuanced analysis requires deeper reasoning

See model comparison for pricing details. Fable 5 costs $10/$50 per million tokens (input/output) — exactly 2× Opus 5.

Switch models mid-session with /model or set defaults in /config.

Reduce MCP Server Overhead

Each MCP server adds tool definitions to your context, consuming tokens even when idle:

Run /context to see what consumes space
Disable unused servers with /mcp
Prefer CLI tools (gh, aws, gcloud) over MCP servers when possible
Set ENABLE_TOOL_SEARCH=auto:5 to trigger MCP tool search when tool definitions exceed 5% of the context window (the default trigger is 10%). Deferred tools only enter context when actually used, so a lower threshold trims idle definitions

Delegate to Subagents

Subagents have their own context windows. Use them for:

Verbose operations (reading many files, running test suites)
Parallel tasks that would otherwise bloat the main context
Repetitive operations (applying the same change across multiple files)

Configure subagents with cheaper models. Use model: haiku for trivial subagents (mechanical edits, file scans) and model: sonnet for ones that need real reasoning:

---
model: haiku
---

Copy-paste CLAUDE.md section for cost-conscious development:

# Cost Management Rules

- Use Sonnet for all tasks unless I specifically request Opus or Fable
- Reserve Fable 5 (/model fable) for complex multi-file refactors, greenfield builds, and final verification passes where quality matters most
- Before reading files, use grep to find relevant sections first
- When exploring a codebase, start with directory listing and README, not reading every file
- Delegate multi-file operations to subagents with model: sonnet
- Clear context between unrelated tasks

More Levers Worth Knowing

A few additional knobs from the current cost guidance:

Install code-intelligence plugins for typed languages: they give Claude precise symbol navigation instead of grep-then-read-many-files, cutting exploratory token spend on TypeScript, Go, Rust, and similar codebases.
Move workflow-specific instructions from CLAUDE.md into skills: CLAUDE.md loads at session start, so detailed PR-review or migration instructions cost tokens even on unrelated work. Skills load on demand only when invoked. Aim to keep CLAUDE.md under ~500 lines.
Tune adaptive reasoning: thinking tokens bill as output. For simpler tasks, lower effort with /effort or the /model slider. A positive MAX_THINKING_TOKENS cap applies only to fixed-budget mode on Opus/Sonnet 4.6, and Fable 5 thinking cannot be disabled.

When This Breaks

Telemetry data not appearing: Check that CLAUDE_CODE_ENABLE_TELEMETRY=1 is set. Verify the OTLP endpoint is reachable from developer machines. The default export interval is 60 seconds for metrics — wait at least that long before debugging.

Costs higher than expected: Check /context to see what is consuming space. Large MCP server configurations or bloated auto-memory files inflate every request. Also check for sessions that were never cleared — stale context accumulates.

Rate limits hit during high-usage periods: The per-user TPM guidelines assume average concurrency. During training sessions or onboarding events, temporarily increase limits or stagger usage.

Bedrock/Vertex costs not tracked: Claude Code does not send metrics from your cloud provider. Use LiteLLM or your cloud provider’s own cost tracking for Bedrock/Vertex billing.

What is Next

Enterprise Integration — Organization-wide telemetry deployment
GitHub Actions — Track CI costs alongside developer usage
Performance and Cost Tips — 10 specific tips for reducing token usage