MCP server count — 3-5 well-matched, not 15 of everything

Scorecard question: How many MCP servers do you have active in your main setup? Max-score answer (3 pts): 3-5 well-matched to my workflow (the sweet spot).

Why this matters in 2026

The instinct that bit most developers in late 2025 was “MCP is free, so install everything.” It is not free. Every MCP server you connect injects its full tool catalogue into every single message your agent sends — names, descriptions, parameter schemas, the lot. A single well-documented tool definition typically runs 100–500 tokens, a server with 10 tools costs 1,500–3,000 tokens of pure schema overhead per turn, and a 4-server stack with verbose descriptions can burn 12,000–20,000 tokens before the model has read your prompt. That is context you paid for and the model never gets to use for actual reasoning. Claude’s output quality visibly degrades after about 50 exposed tools — the model starts referencing tools instead of answering your question, chases tangents into tool-list space, and confuses semantically-overlapping verbs (“read_file” from filesystem MCP vs. “get_file_content” from GitHub MCP vs. “fetch” from your custom MCP). Cursor learned this the hard way and now enforces a hard 40-tool cap. The 2026 consensus is that 3 servers is the sweet spot and 5 is the practical maximum for everyday work — not because more is technically impossible, but because every server past 5 demonstrably costs you more in degraded reasoning than it gives you in capability. The win is fit, not count.

What “max score” actually looks like

A max-score Q9 setup is small, intentional, and boring. Four MCP servers, each earning its slot. A concrete example for a typical full-stack TypeScript engineer in 2026: GitHub MCP (the current canonical install is the hosted remote endpoint https://api.githubcopilot.com/mcp/, not the archived @modelcontextprotocol/server-github local package) for PR review, issue triage, and repo browsing without leaving the agent; Context7 for fetching up-to-date library docs on demand (no more hallucinated React 18 APIs that were deprecated in React 19); Playwright/Chrome DevTools MCP for end-to-end browser verification — the agent drives a real Chromium, takes screenshots, and confirms its own changes actually rendered; and one domain-specific MCP for whatever pays the bills (Stripe MCP for a payments codebase, Linear MCP for ticket sync, Sentry MCP for production-error triage, Polar MCP for subscription work). That is it. Four servers, roughly 25–35 total tool definitions, about 5,000–8,000 tokens of overhead per turn — well under the degradation threshold, and every one of those tools earns its tokens because you reach for it in real work, not “in case.” Contrast that with a typical bloated setup: GitHub MCP, GitLab MCP, generic filesystem MCP, generic git MCP (already redundant with Claude Code’s built-in Bash), three database MCPs for databases you query twice a year, Slack MCP, Notion MCP, Linear MCP, Jira MCP, two weather/time MCPs from someone’s blog post, plus four “fun” MCPs installed in a single afternoon. Thirteen servers, 80+ tools, 18,000 tokens of overhead, and a noticeably stupider agent. That is the trap.

Current landscape (web-search-verified)

The public MCP registry exploded from roughly 1,200 servers at the end of Q1 2025 to 3,400 by Q3 2025, 6,800 at year-end, and over 9,400 by mid-April 2026 — a 7.8x year-over-year expansion with month-over-month growth still at +18% in Q1 2026. The asymmetry that creates is brutal: every week brings 60–80 new “you should install this” blog posts, but the cost of every additional install is paid silently in every turn of every session for as long as the server stays connected. The market has not yet built strong feedback loops that surface “this MCP makes your agent dumber,” so the discipline has to come from the developer.

Why more isn’t better

There are three independent compounding effects, and each one alone is enough to justify the 3–5 cap. First, raw token cost. Tool definitions live in the system prompt of every turn, not just the first one. A 15-server stack typically eats 15,000–25,000 tokens of schema overhead per turn — multiply by a 50-turn session and you have burned a full Sonnet 5 context window on tool definitions the model used twice. Second, reasoning quality. Models trained with tool-use RLHF were optimized on stacks of 5–20 tools, not 100+. Past about 50 tools, output quality measurably drops: the model picks the wrong tool when two have overlapping descriptions, references tools that don’t exist (hallucinating from the noise), and chases tool-shaped tangents when a plain text answer was better. The MindStudio benchmark put MCP-heavy setups at 72% reliability on hard tasks vs. 100% for narrower stacks. Third, ambiguity tax. When three different MCPs expose read_file-style verbs, every invocation requires the model to spend tokens disambiguating which one to call. That cost compounds turn over turn and is invisible — it never shows up as a single bad output, just as a slower, slightly worse agent across the whole session.

Token overhead per MCP server

Concrete numbers, useful for back-of-envelope decisions: a minimal MCP server with 3 well-described tools costs ~500–900 tokens per turn. A typical server (GitHub MCP at ~25 tools with reasonable descriptions) costs ~3,000–4,500 tokens per turn. A bloated server (some of the older “kitchen sink” community MCPs with 60+ tools and copy-pasted long-form descriptions) costs 8,000–12,000 tokens per turn — by itself. A 4-server stack of typical MCPs ends up at roughly 12,000–18,000 tokens of overhead per turn; a 13-server stack with one or two bloated servers easily clears 30,000 tokens per turn. Atlassian’s open-source mcp-compressor proxy (April 2026) demonstrates the magnitude by compressing tool descriptions 70–97% without breaking calls — proof that most of those tokens are pure waste. Until proxies and on-demand tool loading become defaults across all agents, the cheapest optimization is “install fewer servers.”

How to choose which 3-5

Three rules, applied in order. (1) Does it touch your daily critical path? If you don’t open it within a typical week of real work, it does not earn a slot. Aspiration installs (“I might need a database MCP someday”) are the single biggest source of bloat. (2) Does it cover ground your agent can’t reach without it? Claude Code already has Bash, file I/O, and grep — installing a generic filesystem or git MCP duplicates capability you already have for free. The good MCPs reach outside the local repo (real APIs, real browsers, real databases, real services), not inside it. (3) Does it have a clean tool surface? Prefer MCPs that expose 5–15 well-named tools with crisp descriptions over MCPs with 40+ overlapping verbs. The first kind compresses well in your context budget; the second is a quality tax you pay every turn. Apply these three filters to your current .mcp.json and most stacks shrink to 3–5 without losing any actual capability.

Removing the long tail (audit pattern)

The audit pattern is simple and ruthless. Open .mcp.json (or ~/.claude/settings.json if you have a global MCP block), list every server, and for each one ask: when did I last invoke a tool from this MCP? If the answer is “in the last 7 days, on real work” — keep. If “in the last 30 days, once, exploring” — uninstall and reinstall later if real demand appears. If “never, I installed it from a blog post” — uninstall immediately. Track invocations honestly via Claude Code session logs (~/.claude/projects/<repo>/) or Codex CLI history — grep for the server’s tool names. Almost every stack you audit will reveal three or four servers that haven’t been called in weeks. Those are pure overhead. The hardest part is psychological: people resist uninstalling things they spent time configuring. Resist back — the token tax never sleeps.

Step-by-step: auditing and pruning your MCP stack

Inventory the current stack. Open .mcp.json at your repo root (project-scoped servers) and ~/.claude/settings.json or ~/.codex/config.toml (user-scoped servers). List every MCP and the rough number of tools each exposes — claude mcp list and claude mcp get <name> print this if you don’t want to count by hand. Write the list on paper. If it exceeds 7 entries, you are very likely losing tokens and quality to overhead.
Score each server against the three filters. For every server, answer in writing: (1) did I invoke a tool from this in the last 7 days of real work? (2) does this cover ground my agent’s built-ins can’t reach? (3) is the tool surface small and crisp, or 40+ verbose verbs? Three “yes” answers earns a keep. Anything else earns a strong lean toward uninstall.
Identify duplicates ruthlessly. Generic filesystem MCP + Claude Code’s built-in Read/Write = duplicate, uninstall the MCP. Generic git MCP + Bash + gh CLI = duplicate, uninstall. GitHub MCP + GitLab MCP when you only use one provider = uninstall the unused one. Two database MCPs for the same database flavour = pick the better-maintained one. Every duplicate is paying token cost twice for the same capability.
Cut to four servers, plus one optional fifth. Aim for the canonical max-score shape: one code-collab MCP (GitHub or GitLab), one docs MCP (Context7), one browser MCP (Playwright or Chrome DevTools), and one domain MCP for whatever you actually work on (Stripe, Linear, Sentry, Polar, Notion — pick one). A fifth slot is fine if it genuinely sees weekly use. A sixth needs to displace one of the first five, not stack on top.
Uninstall the long tail in one pass. For Claude Code: claude mcp remove <name> per server, or edit .mcp.json directly and remove the block. For Codex CLI: edit ~/.codex/config.toml and remove the [mcp_servers.<name>] table. Restart the agent so the changes take effect. Do not skip the restart — MCP server lists are loaded at session start, not per turn.
Verify token overhead actually dropped. Start a fresh session and check the tool-list length. If you keep a rough running estimate (200–400 tokens per tool average), you should see your per-turn overhead drop from “alarming” to “negligible.” Claude Code’s /cost and /context commands also surface this; use them to confirm the win is real, not theoretical.

Copy-paste prompt for a quick post-prune sanity check that the remaining tool surface is lean:

List every tool you currently have available, grouped by which MCP server exposes it, and give me a rough token estimate for the whole tool list at ~250 tokens per tool.
Lock the stack in your repo. Commit .mcp.json so your future self and your teammates inherit the curated 3–5, not whatever ad-hoc state your laptop happens to be in. Add a one-line comment explaining why each server is in the list. This is the artefact that prevents the next bloat cycle six months from now when you forget which ones were keepers.

Common pitfalls

Installing “just in case.” The single biggest source of MCP bloat is the aspirational install: “I might query the database from the agent someday, so I’ll add Postgres MCP now.” That MCP then sits in your stack for 14 months, taxing every turn, never getting called. Rule: install on the day you have a concrete task that needs it, not before.
Duplicate functionality. Filesystem MCP, git MCP, “shell” MCP, “everything” MCP — these duplicate capabilities your terminal agent already has natively via Bash and built-in file tools. They look harmless but each one adds 1,000–3,000 tokens of overhead for capability that was already free. Audit specifically for built-in duplicates first.
Treating count as the metric. People show off “I have 22 MCPs!” the way they used to show off VS Code extension counts. The correct brag in 2026 is the opposite: “I have 4 MCPs and they all see daily use.” Count is anti-signal past 5.
Mixing personal-curiosity MCPs with work MCPs. Weather MCP, time MCP, joke MCP, “ask the LLM about astronomy” MCP — fun to install once, but they sit in your stack and consume context for every work task forever. Either run them in a separate scratch project, or don’t install them at all. Your main work stack should look like a professional kitchen — sharp, sparse, every tool earning its drawer.
Forgetting MCP servers cost tokens on Tier 1 (cheap) models too. Some developers think “I’ll just route to Haiku for cheap tasks, the MCP overhead doesn’t matter.” It still matters — Haiku’s context window is smaller, so MCP overhead consumes a larger fraction of it. A 15-MCP stack on Haiku can chew through 25% of context with tool definitions before any work happens.
Not auditing on a cadence. A clean stack today drifts back to bloat over 6 months because every new blog post adds one more “essential” server. Schedule a quarterly MCP audit — calendar invite, 15 minutes, run the three filters from the audit pattern above. That single recurring habit keeps Q9 at max-score forever.

How to verify you’re there

Your .mcp.json has between 3 and 5 servers — not 8, not 12, not 20.
Every server in the file was invoked on real work within the last 7 days.
No two servers expose overlapping verbs (no read_file + get_file_content + fetch collision).
Your stack covers four functional zones: code collab, docs, browser/verification, one domain-specific.
Per-turn tool-list overhead is under ~8,000 tokens (rough check: count tools × 250).
You can explain, in one sentence per server, why each one earns its slot.
You have removed at least one MCP server in the last 90 days that turned out not to pull its weight.
Your .mcp.json is committed to the repo with a one-line comment explaining each server’s role.