AI Usage Cost Governance

Finance pings you on the first of the month: the AI tooling line went from a rounding error to five figures, nobody can say which team drove it, and the renewal conversation is next week. You don’t have a spending problem yet - you have a visibility problem. Tokens are metered, not seat-priced, so a single engineer running a 200K-token agentic refactor on a flagship model can cost more in an afternoon than another engineer does all month. This guide shows how to get that spend under control across Cursor, Claude Code, and Codex using each tool’s real, documented controls - not a bespoke governance platform you’d have to build.

What You’ll Walk Away With

A working cost-visibility setup for each tool: Claude Code OpenTelemetry metrics, the Cursor team usage dashboard, and Codex usage reporting
A real OTEL_RESOURCE_ATTRIBUTES configuration that attributes spend to team and cost center
A model-selection policy that defaults to cheaper models and escalates only when the task justifies it
Copy-paste prompts for running a spend audit and a model-routing review with your AI assistant
Concrete failure modes (silent telemetry, managed-settings overrides, budget alerts that never fire) and how to recover

Where the Money Actually Goes

Before you instrument anything, get the mental model right. Three variables drive the bill, in order of impact:

Model choice. The gap between a flagship and a mid-tier model is roughly 1.5-5x per token. Defaulting every task to the most expensive model is the single biggest source of waste.
Context size. You pay for input tokens too. Loading an entire repo into context for a one-file change is silently expensive, and cache reads only partly offset it.
Agentic loops. Autonomous multi-step runs (large refactors, test-fix loops, deep research) multiply token usage. They’re often worth it - but they need to be a deliberate choice, not an accident.

Cost governance is mostly about making those three variables visible and putting light, non-blocking guardrails on the expensive paths. You do not need a custom MCP “cost gateway” or an invented YAML framework. Every tool ships the primitives.

Step 1: Turn On Cost Visibility

You can’t govern what you can’t see. Each tool exposes usage differently - CLI telemetry for Claude Code, an admin dashboard for Cursor, and org usage reporting for Codex. Set up all three; the workflows genuinely differ here.

Cursor is IDE-first, so its cost controls live in the team admin dashboard (cursor.com/dashboard), not in config files. As a team admin you get:

Usage by member, split into two included pools per seat: Composer + Auto (first-party models) and Third-Party API (BYO-key model usage). This immediately surfaces your heavy users.
Team-wide monthly spending limits that cap overage before it runs away.
Smart alerts on dollar thresholds, delivered to Slack or email before a billing surprise lands.
Seat-type recommendations - Cursor flags when a member’s usage fits a Standard ($32/seat/mo annual) vs Premium ($96/seat/mo annual) seat, so you’re not overpaying for light users.

There’s nothing to install: the visibility is built into the Business/Teams plan dashboard. Your job is to turn the spending limit and alerts on, then review the per-member breakdown monthly.

Claude Code is CLI-first and exports the richest cost telemetry of the three via OpenTelemetry. Enable it with environment variables (verified against the official monitoring docs):

# Enable telemetry and pick exporters
export CLAUDE_CODE_ENABLE_TELEMETRY=1
export OTEL_METRICS_EXPORTER=otlp        # otlp | prometheus | console
export OTEL_LOGS_EXPORTER=otlp

# Point at your collector (Grafana/Prometheus, Honeycomb, Datadog, etc.)
export OTEL_EXPORTER_OTLP_PROTOCOL=grpc
export OTEL_EXPORTER_OTLP_ENDPOINT=http://collector.company.com:4317
export OTEL_EXPORTER_OTLP_HEADERS="Authorization=Bearer ${OTEL_TOKEN}"

The metrics you care about for cost work are real and documented: claude_code.cost.usage (USD, tagged by model), claude_code.token.usage (tagged by type: input/output/cacheRead/cacheCreation and model), claude_code.session.count, and claude_code.active_time.total. That cost.usage metric is the whole ballgame - sum it, group by model and team, and you have per-team spend without building anything.

For org-wide rollout, push the same vars through the managed settings file so every developer reports automatically and can’t silently disable it.

Step 2: Attribute Spend to Teams

Aggregate spend is a number to panic about; attributed spend is a number to act on. For Claude Code, attribution is a one-line change. The OTEL_RESOURCE_ATTRIBUTES variable tags every metric with whatever dimensions you set - and it follows the W3C Baggage spec, which means no spaces in values (a common gotcha):

# Correct: comma-separated key=value, no spaces, underscores instead
export OTEL_RESOURCE_ATTRIBUTES="department=engineering,team.id=platform,cost_center=eng-123"

# Wrong: spaces are invalid and silently break the attribute
# export OTEL_RESOURCE_ATTRIBUTES="cost_center=Eng Platform"

Now claude_code.cost.usage is queryable by team.id and cost_center in your backend. Cursor handles this for you - the dashboard is already per-member and per-team. Codex attribution is by workspace/project, so put each team in its own ChatGPT workspace or API project if you need clean per-team numbers.

Step 3: Set a Model-Selection Policy

This is where most savings come from. The principle is simple: default to the cheapest model that does the job, escalate deliberately, and front-load expensive thinking where it prevents costly rework. Here’s a sane default policy for the current lineup (June 2026):

Task	Default model	Escalate to	When to escalate
Syntax fixes, renames, import cleanup	Haiku 4.5 / Auto	-	Never
Everyday feature work, code review	Sonnet 5	Opus 5	Security-sensitive or architectural change
Complex debugging (race conditions, perf)	Sonnet 5	Opus 5	Reproduction is non-obvious after one pass
Architecture design, large refactors	Opus 5	Fable 5	When complexity warrants peak intelligence and budget is secondary
Building from scratch, cross-repo refactors, long-running tasks	Fable 5	-	Use when velocity and quality matter more than token cost; subagents inherit the configured/default model unless you explicitly pin a different model, so budget that work rather than assuming an automatic downgrade

In Cursor, the model picker makes this a per-request decision - start on Auto/Sonnet 5 and bump to Opus 5 only when a task stalls. In Claude Code, set the default model in settings and switch in-session with /model. In Codex, route the GPT-5.6 tier by role and plan: Terra for balanced work, Luna for high-volume jobs, and Sol for the hardest tasks; do not run every prompt at maximum effort.

Step 4: Put Light Guardrails on the Expensive Paths

Hard blocks breed shadow IT and resentment. Favor transparency and nudges over approval gates:

Cap the runaway cases, not the routine ones. Set Cursor’s team-wide monthly spend limit and Codex’s platform budget limit as a backstop against accidents (a forgotten loop, a misconfigured automation), not as a daily leash. Set the threshold where a genuine surprise lives - 150-200% of a normal month - so it only fires on anomalies.
Alert before you block. Wire Cursor’s smart alerts to Slack at, say, 80% of the monthly limit. For Claude Code, alert on claude_code.cost.usage crossing a rolling threshold in Grafana/your backend. People self-correct when they can see the meter.
Make context discipline a habit, not a rule. The cheapest token is the one you don’t send. Encourage scoping context to the files in play and using each tool’s compaction (Claude Code’s /compact, starting fresh sessions for unrelated work) rather than dragging a bloated context across tasks.
Review monthly, adjust quarterly. Pull the per-team breakdown once a month, run the routing-review prompt above, and only revise budgets and policy when the data says to.

When This Breaks

Real failure modes from rolling this out across teams - and how to recover.

Metrics never arrive in your backend. Almost always a wrong endpoint or protocol mismatch. Confirm OTEL_EXPORTER_OTLP_ENDPOINT points at a port your collector actually listens on (gRPC defaults to :4317, HTTP to :4318), and that OTEL_EXPORTER_OTLP_PROTOCOL matches (grpc vs http/protobuf). Debug locally first with export OTEL_METRICS_EXPORTER=console and OTEL_METRIC_EXPORT_INTERVAL=1000 to see metrics print to the terminal within a second.
Telemetry is silently disabled. If CLAUDE_CODE_ENABLE_TELEMETRY isn’t set (or a managed settings file overrides your shell export), no metrics flow and your dashboards stay empty while spend continues. Managed settings win over user environment variables by design - check the settings precedence if org config and local config disagree.
Attribution comes back blank or garbled. Spaces in OTEL_RESOURCE_ATTRIBUTES violate the W3C Baggage spec and break the value. Quotes don’t escape spaces - org.name="My Team" stores the literal quotes. Use underscores or camelCase.
Budget alerts that never fire. A spend limit with no alert below it is just a wall you hit at full speed. Always set an alert threshold (e.g. 80%) under any hard cap, and test it by temporarily lowering the threshold below current spend to confirm the Slack/email path actually delivers.
Model names drift in your policy doc. A routing policy pinned to last cycle’s models silently routes work to deprecated or pricier-than-necessary models. Re-verify the current lineup and pricing each quarter - the model landscape moves fast, and “default to Opus 5 / Sonnet 5” today won’t be the right string in six months.
MCP gateway auth failures. If you do front cost reporting with an MCP server, a 401/403 on its endpoint means the whole reporting path goes dark without erroring loudly in your dashboards. Treat MCP reporting servers as best-effort enrichment, never as your primary system of record - keep the tool-native telemetry as the source of truth.

What’s Next

Team Onboarding and Adoption Strategies - get a whole team productive on these tools before you optimize their spend
Security Standards and Compliance - the access controls and audit trails that pair with cost governance
Context Cost Optimization - the deeper version of the context-discipline tactics above
Introduction to Model Context Protocol - how the identical MCP config works across all three tools