Skip to content

Cloud Cost Management & FinOps

Your cloud bill jumped 40% this month. Finance wants an explanation by Friday, engineering swears nothing changed, and the only tool you have is a Cost Explorer dashboard with 200 line items and no story. You could spend two days exporting CSVs and pivoting in a spreadsheet — or you could point an AI assistant at a cost MCP server and have it tell you which five services moved, by how much, and why.

This guide shows FinOps practitioners, DevOps engineers, and platform teams how to wire Cursor, Claude Code, and Codex to real cost-management MCP servers and turn raw billing data into right-sizing plans, anomaly alerts, and forecasts you can actually defend in a budget review.

  • A working MCP setup for Vantage (multi-cloud) and the AWS Labs Cost Explorer server, configured identically across Cursor, Claude Code, and Codex
  • A copy-paste prompt that surfaces your top 5 cost-savings opportunities for the current month
  • A right-sizing prompt that returns a phased plan tagged with risk levels, not a flat list of instances
  • An anomaly-detection prompt that distinguishes expected growth from genuine bill shock
  • A clear sense of when this workflow breaks — API charges, rate limits, stale tags — and how to recover

These servers expose billing and usage APIs as tools your AI assistant can call directly. The config is identical across Cursor, Claude Code, and Codex — only the file each tool reads differs (.cursor/mcp.json or Cursor Settings, .mcp.json for Claude Code, ~/.codex/config.toml for Codex). Set up these two and you cover most multi-cloud cases.

Vantage aggregates AWS, Azure, GCP, Kubernetes, and SaaS spend behind one API. The official MCP server runs via npx and authenticates with a read-only bearer token.

{
"mcpServers": {
"vantage": {
"command": "npx",
"args": ["-y", "vantage-mcp-server"],
"env": {
"VANTAGE_TOKEN": "your-read-only-vantage-token"
}
}
}
}

Generate the token from your Vantage account under API access and scope it read-only — the AI never needs write access to analyze spend.

The AWS Labs Cost Explorer server is a Python package distributed on PyPI and run with uvx, not a bare npm binary. It reads your existing AWS credentials via a named profile.

{
"mcpServers": {
"aws-cost-explorer": {
"command": "uvx",
"args": ["awslabs.cost-explorer-mcp-server@latest"],
"env": {
"FASTMCP_LOG_LEVEL": "ERROR",
"AWS_REGION": "us-east-1",
"AWS_PROFILE": "default"
}
}
}
}

For Azure and GCP, the equivalents are @azure/mcp (npx -y @azure/mcp@latest server start) and @google-cloud/gcloud-mcp (npx -y @google-cloud/gcloud-mcp), each reading their native credential chain. Add them only when you actually operate in those clouds.

Codex stores MCP servers in ~/.codex/config.toml rather than JSON. The same two servers look like this:

[mcp_servers.vantage]
command = "npx"
args = ["-y", "vantage-mcp-server"]
env = { VANTAGE_TOKEN = "your-read-only-vantage-token" }
[mcp_servers.aws-cost-explorer]
command = "uvx"
args = ["awslabs.cost-explorer-mcp-server@latest"]
env = { AWS_REGION = "us-east-1", AWS_PROFILE = "default" }

The Workflow: From Bill Shock to a Defensible Plan

Section titled “The Workflow: From Bill Shock to a Defensible Plan”

The pattern is the same regardless of tool: connect a cost MCP server, ask a focused question, then verify the recommendation against the actual resource before you act. The AI is fast at finding candidates; you own the decision to change production.

Open the agent panel and reference the servers by name. Cursor keeps the analysis in your editor so you can drop findings straight into a runbook or Terraform change.

@vantage @aws-cost-explorer Pull last month's spend grouped by
service and linked account. For the five services that grew the
most versus the prior month, tell me the dollar delta, the likely
driver (usage vs. price vs. new resources), and whether the growth
looks expected for a product scaling its user base. Output a table,
then a short prioritized list of what to investigate first.

Cursor returns a table you can iterate on inline — ask follow-ups like “drill into RDS” without restating context.

A typical response groups spend, ranks the movers, and flags which growth is benign. For example, it might report that RDS grew because three read replicas were added (expected for a traffic ramp) while a 45% jump in inter-region transfer has no matching deploy and warrants investigation. Treat the dollar figures as a starting point — confirm them against the provider console before you brief finance, because tag coverage and account boundaries shape what the API returns.

These are the reusable recipes. They name real services and ask for opinionated output, so they work with minimal editing — swap the provider or threshold and run.

Two follow-on jobs round out a FinOps practice: making spend traceable to teams, and turning history into a forward budget.

  1. Design a tagging and allocation strategy

    Ask the assistant to translate your org structure into an enforceable tag schema and a method for splitting shared resources (databases, load balancers, NAT gateways) that no single team owns.

    Using @vantage, design a cost-allocation strategy for an org with
    product teams, shared platform teams, and dev/staging/production
    environments. Define a required tag set, a fallback for untagged
    spend, and a defensible method to split shared-resource costs
    (by request volume, by CPU/memory share). Flag where allocation
    will be approximate so I can set expectations with finance.
  2. Roll history forward into next year’s budget

    Feed the trailing four quarters and your growth assumptions; ask for a forecast with explicit buffers rather than a single number.

    Using @vantage and @aws-cost-explorer, take our last four quarters
    of actual spend and build a next-year monthly forecast. Inputs:
    expected user growth, planned feature launches, and one new region.
    Output a baseline projection, a growth allowance per team, a buffer
    for unplanned cost, and the savings target needed to stay flat.
    Show the assumptions so I can challenge them.
  3. Verify before you commit

    Spot-check the AI’s allocation against one team’s real invoice and confirm the forecast’s growth multiplier matches your product plan. Anchoring on relative history (“last four quarters → next year”) keeps the analysis evergreen instead of pinned to a calendar year that will go stale.

Cost analysis fails in predictable ways. Recognize these early:

  • MCP auth or permission errors. The most common failure is a missing or wrong-scoped token. Vantage needs VANTAGE_TOKEN (read-only is enough); the AWS Labs server needs a valid AWS_PROFILE with ce:Get* IAM permissions. If the server starts but every query returns empty, you almost certainly have a credentials or permissions gap, not a code problem.
  • Cost Explorer API charges and rate limits. Each request is $0.01 and the API throttles under bursty load. An agent that fans out hundreds of daily-granularity calls can rack up cost and start failing with throttling errors. Scope queries to monthly granularity first and drill into daily only where it matters.
  • Stale or missing cost-allocation tags. Allocation and chargeback are only as accurate as your tags. A large “untagged” bucket silently distorts every per-team figure — treat tag coverage as a prerequisite, not a nice-to-have.
  • Right-sizing that causes throttling. Downsizing an instance that looks idle on average can starve it during traffic bursts. Always check peak (not just mean) utilization, change one tier at a time, and watch latency and error rates after each change.
  • Hallucinated server names. If a prompt references a server you never configured (a bare aws-cost-mcp binary, a @kubernetes/mcp-server scope), the tool call silently does nothing and the AI may invent plausible numbers. Use the exact server keys from your config, and for Kubernetes the real package is the unscoped kubernetes-mcp-server (npx -y kubernetes-mcp-server@latest).
  1. Wire up real MCP servers. Vantage via npx and the AWS Labs Cost Explorer server via uvx — the config is identical across Cursor, Claude Code, and Codex.
  2. Ask focused questions, not “optimize everything.” The top-5, phased right-sizing, and anomaly prompts above return decisions you can act on.
  3. Always verify before you change production. The AI finds candidates fast; you own the call. Check peak utilization, confirm dollar figures in the console, and watch the system after each change.
  4. Mind the meter. Cost Explorer requests cost $0.01 each and the API rate-limits — scope queries and keep the server out of autonomous loops.
  5. Tags gate everything. Allocation and forecasting are only as trustworthy as your tag coverage, so fix that first.