Building your own MCP server — when to MCP vs skill vs slash command

Q11 · Extensibility Have you built your own MCP server?

Max-score answer: “I have several and deliberately pick MCP vs skill vs slash command per use case.”

Why this matters in 2026

Building your own MCP is the moment you stop being a tool user and start being a tool integrator. In 2026, the Model Context Protocol has settled into the de-facto extension API for every serious coding agent — Claude Code, Codex, Cursor, Windsurf, Zed — and almost every SaaS that touches developers ships an official MCP server alongside its REST API. That changes the asymmetry of the work: people who only install MCPs from a marketplace can do anything an existing integration covers. People who build MCPs can wrap the one weird internal API, the legacy database, the staging tool, or the niche partner system that nobody else has wrapped — and suddenly the agent can drive their day-job systems end-to-end. The gap between installers and integrators is the single biggest determinant of how much real work an agent can do at your company. The other thing 2026 forces you to learn: not every extension should be an MCP. Skills, slash commands and subagents overlap heavily with MCP for some workloads, and shoving everything into MCP servers bloats tool context, slows the agent down, and surfaces prompts the model has no business reading. Q11 asks whether you’ve crossed the line from consumer to producer and learned to pick the right extension surface for each job.

What “max score” actually looks like

You only top out Q11 when all five of these hold:

You have shipped at least two MCP servers — not forked, not “ran the template once”, but designed, deployed, and used daily. One can be local stdio (wrapper around an internal CLI or sqlite db); at least one should be remote (Cloudflare Workers, Vercel or your own infra) so a teammate or cloud agent can hit it.
You have a written decision tree for MCP vs agent skill vs slash command vs subagent — a literal rule in dotfiles or team handbook that says “data fetch, third-party action, or stateful integration → MCP; prompt-shaped playbook with reference docs → skill; sequence of tool calls triggered by name → slash command; long-running specialised loop → subagent.”
Your MCPs have a narrow, intentional tool surface. Each server exposes at most 5–10 well-named tools with strict input schemas, not 40 thin wrappers around every endpoint. You’ve actively removed tools when the agent kept misusing them.
You wired auth, rate limiting, and observability before shipping. OAuth or scoped token for remote MCPs, request logs you can grep, a cost/rate alert when usage spikes. No production MCP runs on “API key in plaintext, hope for the best”.
You re-evaluate quarterly. Every quarter you look at which custom MCPs got used, which were abandoned, and whether survivors would now be better expressed as skills or slash commands.

Anything less — “I followed the tutorial once” or “I have a server but it has 30 tools and I never auth’d it” — is mid-tier.

Current landscape (web-verified)

MCP SDK in 2026

The two reference SDKs maintained by the Model Context Protocol working group are still the centre of gravity, and both have stabilised significantly through 2025–2026.

Python SDK (modelcontextprotocol/python-sdk) — minimum supported version is 1.2.0; most production servers run 1.5+. Install with uv add "mcp[cli]" httpx. The FastMCP builder (now part of the official SDK) is the path of least resistance — tools are decorated Python functions, schemas inferred from type hints, and the same server runs over stdio or Streamable HTTP with a one-line transport switch.
TypeScript SDK (modelcontextprotocol/typescript-sdk) — npm package is @modelcontextprotocol/sdk (note: not @anthropic/* — Anthropic does not publish MCP servers to npm under its own scope). Install with npm install @modelcontextprotocol/sdk zod. Covers full MCP spec — tools, resources, prompts, sampling — with stdio, SSE and Streamable HTTP transports.
Community SDKs exist for Go, Rust, Java (Quarkus), C# and Kotlin. Use them if your service is already in that language; otherwise default to Python or TypeScript where docs and tooling are deepest.

Higher-level frameworks (FastMCP for TS, MCP-Framework, EasyMCP, Higress) add middleware, auth and observability on top. For a first MCP, prefer the official SDK directly — once you have two or three servers, reach for a framework.

Hosting options

How you deploy depends on who needs to call the server.

Local stdio. The original MCP transport. Server is a binary or script on your machine; the client (Claude Code, Cursor, Codex) spawns it as a child process and talks over stdin/stdout. Good for personal tools, local filesystem wrappers, or internal CLI wrappers. Almost zero ops; near-zero shareability beyond npx/uvx.
Remote HTTP (Streamable HTTP). The current standard for shared and cloud-hosted MCP servers — replaces the old SSE-based transport, now deprecated. One HTTP endpoint, supports streaming, easy to put behind OAuth. Every serious public MCP in 2026 uses Streamable HTTP.
Cloudflare Workers. The most popular remote-MCP hosting in 2026 — effectively free at low volume, globally distributed, with a maintained template plus Durable Objects integration for session state. Endpoint pattern is https://your-mcp-server.<account>.workers.dev/mcp, and Cloudflare’s own MCPs follow the canonical {name}.mcp.cloudflare.com/mcp shape.
Vercel / Fly / Railway / your own infra. All viable; pick whatever your team operates. The MCP spec is transport-agnostic — the same code runs on any HTTP host that supports streaming responses.

For a first remote MCP, npm create cloudflare@latest -- my-mcp-server --template=cloudflare/ai/demos/remote-mcp-authless gets you deployed in under ten minutes. Add OAuth before anything sensitive.

MCP vs skill vs slash command — when each wins

The three extension primitives overlap but each has a clear sweet spot.

MCP wins when the agent needs to do something outside its context — fetch live data, call an API, write to a database, drive a third-party SaaS. MCP tools are typed, schema-validated, callable mid-loop, and reusable across every MCP-aware client. If the answer to “where does the data live?” is “anywhere except the agent’s context window”, that’s MCP territory.
Agent skill wins when you have a playbook — prompt-shaped knowledge plus optional reference files — the agent should load on demand. Skills (the SKILL.md+resources convention popularised by Anthropic’s agent-skills) live as text. They’re the right shape for “how do I review a PR here”, “how do I write a migration”, “how do I author a postmortem”. No external system, no tool calls — just compressed expertise the model pulls in when relevant.
Slash command wins when you want a named entry point triggering a deterministic sequence or a fixed prompt. Slash commands are the keyboard shortcuts of agent workflows: /review-pr, /ship, /refactor-this-component. They often call MCP tools and load skills — but the command itself is just the trigger.
Subagent wins when you have a long-running specialised loop — code review, code exploration, doc writing — you want to delegate so the orchestrator doesn’t drown in tool output. Subagents use MCPs and skills, but own their own context window and model choice (Haiku 4.5 for fan-out, Sonnet 5 for serious reasoning).

Decision tree, condensed:

External system the agent should drive? → MCP.
Prompt + reference docs the agent pulls on demand? → skill.
Named trigger for an existing sequence? → slash command.
Long parallel loop with its own context? → subagent.

When unsure, default to skill — cheapest to ship, easiest to delete. MCPs are the most expensive extension to maintain, so the bar should be highest.

Hello-world MCP code snippet

A minimal Python MCP server with two tools, using the official SDK:

from mcp.server.fastmcp import FastMCP

mcp = FastMCP("releases")

@mcp.tool()
def list_recent_releases(repo: str, limit: int = 5) -> list[dict]:
    """Return the most recent releases for an internal repo."""
    # call your internal API here
    return [{"tag": "v1.2.3", "date": "2026-05-01"}]

@mcp.tool()
def promote_release(repo: str, tag: str, env: str) -> str:
    """Promote a release to staging or prod."""
    assert env in {"staging", "prod"}
    # call your internal release tool here
    return f"promoted {repo}@{tag} to {env}"

if __name__ == "__main__":
    mcp.run()  # stdio by default; pass transport="streamable-http" for remote

Two tools, narrow surface, typed inputs, docstrings as descriptions. That’s the shape to aim for on your first MCP — not a 30-tool wrapper.

Step-by-step: shipping your first MCP

Pick one annoying, repeated task that touches an external system. Don’t pick “I’ll wrap GitHub” — github MCP exists. Pick the internal admin tool, the partner API, the legacy database, the one staging CLI everyone uses. Something with a clear input and a clear output that you do at least once a week.
Sketch the tool surface on paper first. Write the 3–7 tools you’d want the agent to call: their names, inputs, outputs. If you cross 10 tools, stop — split into two MCPs or push the long tail into skills. Narrow surfaces are how the agent stays good at calling your server.
Scaffold with the official template. Local stdio: uv init my-mcp && uv add "mcp[cli]" for Python, or for TypeScript npm install @modelcontextprotocol/sdk zod and start from the SDK’s quickstart (community scaffolders like create-mcp-server exist but are early — there’s no official first-party npm init mcp-server). Remote: npm create cloudflare@latest -- my-mcp-server --template=cloudflare/ai/demos/remote-mcp-authless is the path of least resistance.
Implement the tools with strict input schemas. Use type hints (Python) or Zod schemas (TS). Reject bad inputs early — the agent learns from your error messages, so make them sharp (“tag must match ^v\\d+\\.\\d+\\.\\d+$”, not “invalid input”).
Test with the MCP Inspector first, then a real client. npx @modelcontextprotocol/inspector gives you a UI that lists tools, lets you call them with sample inputs, and shows raw protocol traffic. Get the calls clean in Inspector before pointing Claude Code or Cursor at the server.
Wire auth before you wire production data. For local stdio MCPs, pass secrets via env vars from your client config — never hardcode. For remote MCPs, put OAuth in front (the Cloudflare template includes a GitHub OAuth example) or at minimum a scoped, rotatable bearer token in the Authorization header.
Add rate limiting and logging. Even a personal MCP should log every tool call with timestamp, tool name, input hash and outcome. Cloudflare Workers gets you per-IP/token rate limiting with a few lines of config; locally, a 100-calls-per-minute soft cap is enough to catch runaway agent loops before they nuke your API quota.
Register the MCP with your daily-driver clients. Claude Code: add to .mcp.json for the project or ~/.claude.json globally, using the documented claude mcp add syntax with --transport http for remote servers. Cursor and Codex have equivalent UIs. Use the server for a real task within 48 hours — if you don’t, kill it now and rethink.
Write the README and the “when to use this” line. One paragraph in your dotfiles or team docs: what the MCP does, when to reach for it, when not to. This is what stops the future-you (or your teammates) from rebuilding the same thing six months later.
Iterate on the tool surface. After a week of use, look at the call log. Tools that never got called: delete them. Tools that got misused: rename or sharpen the input schema. Tools that got called constantly but with the same arguments: consider promoting into a slash command instead.

Common pitfalls

Over-broad tool surface area. A 40-tool MCP is almost always worse than a 6-tool one. Every additional tool is more context the model reads each turn, more chances to pick the wrong tool, more code to maintain. If your wrapped API has 200 endpoints, your MCP exposes the 6 that matter. Add more only when you feel concrete pain.
No auth on a remote MCP. “It’s just a personal server” is how internal credentials leak. Any remote endpoint needs OAuth or at minimum a scoped, rotatable bearer token, plus IP allowlisting if the audience is small. Treat MCP endpoints like internal admin APIs.
No rate limiting. Agents loop. A bug in your prompt can call list_releases 400 times in a minute. With no cap, you’ve DOS’d your own infra. Cap at the MCP layer, separately from upstream API caps.
Putting prompts into MCPs instead of skills. If your “tool” is really “render this prompt, read these docs, output this format”, it should be a skill. Skills are cheaper to write, easier to update, version-controlled with the project, and don’t burn tool-context tokens.
Wrapping things that already have a good MCP. GitHub, Linear, Sentry, Cloudflare, Context7, Stripe, Notion, Slack, Figma — these all have first-party or canonical community MCPs. Building your own duplicate is wasted work and you’ll lose every upstream spec/auth update.
Deploying once, never reviewing. Custom MCPs rot — APIs change, tokens expire, use cases fade. Without quarterly review, you accumulate a graveyard of half-working servers that confuse the agent and pollute the tool list.

How to verify you’re there

You can name, from memory, at least two MCP servers you have personally written and deployed.
At least one of those MCPs is remote (Cloudflare Workers, Vercel, or your own infra) and a teammate or a cloud agent has used it.
Each of your MCPs exposes ≤10 tools, with typed input schemas and short descriptions.
You have a written rule (in CLAUDE.md, dotfiles, or your team handbook) for MCP vs skill vs slash command vs subagent.
Each remote MCP has auth (OAuth or scoped token) and a request log you can grep.
You used MCP Inspector to validate the server before pointing a coding agent at it.
You’ve deleted at least one tool from one of your MCPs because the agent kept misusing it.
You can point to one extension that you deliberately shipped as a skill or slash command rather than an MCP, because the decision tree pointed that way.
You re-audit your custom MCP list every quarter and have killed at least one that stopped paying its way.