Skip to content

AI Usage Cost Governance

Finance pings you on the first of the month: the AI tooling line went from a rounding error to five figures, nobody can say which team drove it, and the renewal conversation is next week. You don’t have a spending problem yet - you have a visibility problem. Tokens are metered, not seat-priced, so a single engineer running a 200K-token agentic refactor on a flagship model can cost more in an afternoon than another engineer does all month. This guide shows how to get that spend under control across Cursor, Claude Code, and Codex using each tool’s real, documented controls - not a bespoke governance platform you’d have to build.

  • A working cost-visibility setup for each tool: Claude Code OpenTelemetry metrics, the Cursor team usage dashboard, and Codex usage reporting
  • A real OTEL_RESOURCE_ATTRIBUTES configuration that attributes spend to team and cost center
  • A model-selection policy that defaults to cheaper models and escalates only when the task justifies it
  • Copy-paste prompts for running a spend audit and a model-routing review with your AI assistant
  • Concrete failure modes (silent telemetry, managed-settings overrides, budget alerts that never fire) and how to recover

Before you instrument anything, get the mental model right. Three variables drive the bill, in order of impact:

  • Model choice. The gap between a flagship and a mid-tier model is roughly 1.5-5x per token. Defaulting every task to the most expensive model is the single biggest source of waste.
  • Context size. You pay for input tokens too. Loading an entire repo into context for a one-file change is silently expensive, and cache reads only partly offset it.
  • Agentic loops. Autonomous multi-step runs (large refactors, test-fix loops, deep research) multiply token usage. They’re often worth it - but they need to be a deliberate choice, not an accident.

Cost governance is mostly about making those three variables visible and putting light, non-blocking guardrails on the expensive paths. You do not need a custom MCP “cost gateway” or an invented YAML framework. Every tool ships the primitives.

You can’t govern what you can’t see. Each tool exposes usage differently - CLI telemetry for Claude Code, an admin dashboard for Cursor, and org usage reporting for Codex. Set up all three; the workflows genuinely differ here.

Cursor is IDE-first, so its cost controls live in the team admin dashboard (cursor.com/dashboard), not in config files. As a team admin you get:

  • Usage by member, split into two included pools per seat: Composer + Auto (first-party models) and Third-Party API (BYO-key model usage). This immediately surfaces your heavy users.
  • Team-wide monthly spending limits that cap overage before it runs away.
  • Smart alerts on dollar thresholds, delivered to Slack or email before a billing surprise lands.
  • Seat-type recommendations - Cursor flags when a member’s usage fits a Standard ($32/seat/mo annual) vs Premium ($96/seat/mo annual) seat, so you’re not overpaying for light users.

There’s nothing to install: the visibility is built into the Business/Teams plan dashboard. Your job is to turn the spending limit and alerts on, then review the per-member breakdown monthly.

Aggregate spend is a number to panic about; attributed spend is a number to act on. For Claude Code, attribution is a one-line change. The OTEL_RESOURCE_ATTRIBUTES variable tags every metric with whatever dimensions you set - and it follows the W3C Baggage spec, which means no spaces in values (a common gotcha):

Terminal window
# Correct: comma-separated key=value, no spaces, underscores instead
export OTEL_RESOURCE_ATTRIBUTES="department=engineering,team.id=platform,cost_center=eng-123"
# Wrong: spaces are invalid and silently break the attribute
# export OTEL_RESOURCE_ATTRIBUTES="cost_center=Eng Platform"

Now claude_code.cost.usage is queryable by team.id and cost_center in your backend. Cursor handles this for you - the dashboard is already per-member and per-team. Codex attribution is by workspace/project, so put each team in its own ChatGPT workspace or API project if you need clean per-team numbers.

This is where most savings come from. The principle is simple: default to the cheapest model that does the job, escalate deliberately, and front-load expensive thinking where it prevents costly rework. Here’s a sane default policy for the current lineup (June 2026):

TaskDefault modelEscalate toWhen to escalate
Syntax fixes, renames, import cleanupHaiku 4.5 / Auto-Never
Everyday feature work, code reviewSonnet 4.6Opus 4.8Security-sensitive or architectural change
Complex debugging (race conditions, perf)Sonnet 4.6Opus 4.8Reproduction is non-obvious after one pass
Architecture design, large refactorsOpus 4.8Fable 5When complexity warrants peak intelligence and budget is secondary
Building from scratch, cross-repo refactors, long-running tasksFable 5-Use as default when velocity and quality matter more than token cost; subagents still auto-run on lower tiers so cost stays contained

In Cursor, the model picker makes this a per-request decision - start on Auto/Sonnet 4.6 and bump to Opus 4.8 only when a task stalls. In Claude Code, set the default model in settings and switch in-session with /model. In Codex, GPT-5.5 powers all surfaces; reserve heavier reasoning effort for genuinely hard tasks rather than running every prompt at maximum.

Step 4: Put Light Guardrails on the Expensive Paths

Section titled “Step 4: Put Light Guardrails on the Expensive Paths”

Hard blocks breed shadow IT and resentment. Favor transparency and nudges over approval gates:

  1. Cap the runaway cases, not the routine ones. Set Cursor’s team-wide monthly spend limit and Codex’s platform budget limit as a backstop against accidents (a forgotten loop, a misconfigured automation), not as a daily leash. Set the threshold where a genuine surprise lives - 150-200% of a normal month - so it only fires on anomalies.

  2. Alert before you block. Wire Cursor’s smart alerts to Slack at, say, 80% of the monthly limit. For Claude Code, alert on claude_code.cost.usage crossing a rolling threshold in Grafana/your backend. People self-correct when they can see the meter.

  3. Make context discipline a habit, not a rule. The cheapest token is the one you don’t send. Encourage scoping context to the files in play and using each tool’s compaction (Claude Code’s /compact, starting fresh sessions for unrelated work) rather than dragging a bloated context across tasks.

  4. Review monthly, adjust quarterly. Pull the per-team breakdown once a month, run the routing-review prompt above, and only revise budgets and policy when the data says to.

Real failure modes from rolling this out across teams - and how to recover.

  • Metrics never arrive in your backend. Almost always a wrong endpoint or protocol mismatch. Confirm OTEL_EXPORTER_OTLP_ENDPOINT points at a port your collector actually listens on (gRPC defaults to :4317, HTTP to :4318), and that OTEL_EXPORTER_OTLP_PROTOCOL matches (grpc vs http/protobuf). Debug locally first with export OTEL_METRICS_EXPORTER=console and OTEL_METRIC_EXPORT_INTERVAL=1000 to see metrics print to the terminal within a second.

  • Telemetry is silently disabled. If CLAUDE_CODE_ENABLE_TELEMETRY isn’t set (or a managed settings file overrides your shell export), no metrics flow and your dashboards stay empty while spend continues. Managed settings win over user environment variables by design - check the settings precedence if org config and local config disagree.

  • Attribution comes back blank or garbled. Spaces in OTEL_RESOURCE_ATTRIBUTES violate the W3C Baggage spec and break the value. Quotes don’t escape spaces - org.name="My Team" stores the literal quotes. Use underscores or camelCase.

  • Budget alerts that never fire. A spend limit with no alert below it is just a wall you hit at full speed. Always set an alert threshold (e.g. 80%) under any hard cap, and test it by temporarily lowering the threshold below current spend to confirm the Slack/email path actually delivers.

  • Model names drift in your policy doc. A routing policy pinned to last cycle’s models silently routes work to deprecated or pricier-than-necessary models. Re-verify the current lineup and pricing each quarter - the model landscape moves fast, and “default to Opus 4.8 / Sonnet 4.6” today won’t be the right string in six months.

  • MCP gateway auth failures. If you do front cost reporting with an MCP server, a 401/403 on its endpoint means the whole reporting path goes dark without erroring loudly in your dashboards. Treat MCP reporting servers as best-effort enrichment, never as your primary system of record - keep the tool-native telemetry as the source of truth.