Distributed Systems Development with AI
You change one field on the Order service’s API and three other services start returning 500s in staging. The trace is incomplete because two services never propagated the trace context, the saga that processes payments silently skipped its compensation step, and your on-call dashboard shows green while customers can’t check out. Distributed systems fail in the gaps between services—and that’s exactly where AI assistants are most useful and most dangerous: they generate plausible cross-service scaffolding fast, but pasting “generate the entire production system” gets you code you can’t verify.
This guide shows how to use Cursor, Claude Code, and Codex to do the parts AI is genuinely good at—drafting service skeletons, propagating trace context, writing the boring compensation logic—while keeping the verification loop tight enough that you’d ship the result.
What You’ll Walk Away With
Section titled “What You’ll Walk Away With”- A workflow for coordinating a single feature across multiple service repos in each of the three tools
- Copy-paste prompts for designing service boundaries, writing a saga step with its compensation and a failing test, and instrumenting OpenTelemetry incrementally
- The real, verified MCP servers for monitoring (Sentry, Grafana, Dynatrace) and infrastructure (Docker, Kubernetes, AWS)—with the exact install commands
- A “When This Breaks” section covering the failure modes that actually bite: broken trace context, missing saga compensation, and MCP auth failures
MCP Servers That Actually Exist
Section titled “MCP Servers That Actually Exist”Half the value of AI on distributed systems is letting it query live infrastructure instead of guessing. But the ecosystem is full of look-alike npm packages—sentry-mcp is a low-traffic stub, not Sentry’s server. Use these verified servers. MCP setup is identical across Cursor, Claude Code, and Codex: all three read the same server definitions (.mcp.json for Claude Code, .cursor/mcp.json for Cursor, ~/.codex/config.toml for Codex), so the commands below apply to whichever tool you drive.
Monitoring and observability
Section titled “Monitoring and observability”-
Sentry (errors, traces, releases) — use the official hosted server with OAuth, no token to manage:
Terminal window claude mcp add --transport http sentry https://mcp.sentry.dev/mcpFor a self-hosted Sentry, the official npm package is
@sentry/mcp-server:Terminal window claude mcp add sentry -- npx -y @sentry/mcp-server@latest --access-token=YOUR_TOKEN -
Grafana (dashboards, Loki/Prometheus queries, incidents) — the official server is
grafana/mcp-grafana, a Go binary distributed via Docker (there is nomcp-grafananpm package):Terminal window claude mcp add grafana -- docker run --rm -i \-e GRAFANA_URL=http://localhost:3000 \-e GRAFANA_SERVICE_ACCOUNT_TOKEN=YOUR_TOKEN \grafana/mcp-grafana -t stdio -
Dynatrace (APM, AI anomaly detection) — the official package is published by the Dynatrace OSS org and needs Node 22.10+:
Terminal window DT_ENVIRONMENT=https://YOUR.apps.dynatrace.com \claude mcp add dynatrace -- npx -y @dynatrace-oss/dynatrace-mcp-server@latest
Containers and infrastructure
Section titled “Containers and infrastructure”-
Docker — the official MCP ships with Docker Desktop’s MCP Toolkit; you run the gateway rather than an npm package:
Terminal window claude mcp add docker -- docker mcp gateway runIn Cursor, add a command-type server in Settings → MCP pointing at the same
docker mcp gateway run. -
Kubernetes —
kubernetes-mcp-serveris a real package; it uses your current kubeconfig context:Terminal window claude mcp add k8s -- npx -y kubernetes-mcp-server@latest -
AWS — AWS Labs publishes purpose-specific servers (not one monolithic image). Pick the one you need and rely on the standard AWS credential chain rather than inlining keys:
Terminal window claude mcp add aws-api -- uvx awslabs.aws-api-mcp-server@latestBrowse the full catalog at awslabs.github.io/mcp. For Google Cloud, deploy a custom MCP server on Cloud Run—see cloud.google.com/run/docs.
Designing Service Boundaries
Section titled “Designing Service Boundaries”AI drafts bounded-context proposals quickly, but the boundaries are a business decision—treat the output as a first draft to argue with, not a verdict. Start narrow: ask for boundaries plus the reasoning, so you can spot where the model conflated a technical layer with a domain.
When you’ve agreed on boundaries, design one service at a time. Resist “generate all services”—you can’t review a dump of seven services, and the contracts between them are where bugs hide.
Saga Patterns You Can Actually Verify
Section titled “Saga Patterns You Can Actually Verify”The saga pattern is where AI-generated distributed code most often looks right and is wrong. The failure mode is always the same: the happy path is fine, but a compensation step is non-idempotent, or a timeout budget is missing. The fix is to build one step at a time, each with its compensation and a failing test first, then watch it go green.
Open the Order service repo and switch to Agent mode. Ask for one saga step plus a failing test, run the test in Cursor’s terminal, and only accept the diff once it’s green. Use a checkpoint before each step so you can roll back a bad compensation without losing the prior steps. Cursor’s inline diff view makes it easy to spot when the model “fixed” the test by weakening the assertion instead of the code.
Drive it from the terminal so the test run is part of the loop. Claude Code can run the test, read the failure, and iterate without you copy-pasting output:
claude "Implement step 3 of the order saga (process payment) plus itscompensation (refund). Write a failing test that asserts the refund fireswhen step 4 throws, then make it pass. Run the test with `npm test -- saga`and show me the diff before committing."Add a hook in .claude/settings.json that runs the saga test suite on every edit to saga/, so a regression in an earlier step surfaces immediately.
Use a Codex worktree so the saga work is isolated in its own local checkout, then have Codex run the suite per step. Because Codex spans CLI, IDE, and Cloud, you can kick the step off from the CLI and review the resulting diff in the IDE:
codex --ask-for-approval on-request "Implement the payment step of the ordersaga with an idempotent compensation. Add a test that injects a failure atthe inventory step and asserts payment is refunded exactly once on retry.Run the suite and stop for my review before applying."Tie every step to an observable check: after the model claims a step works, run the one test that proves the compensation fires. If you can’t articulate the test, you can’t trust the code.
Inter-Service Communication and Trace Context
Section titled “Inter-Service Communication and Trace Context”Service mesh and gateway configs are high-leverage for AI—but again, incrementally. Start with the smallest config that you can verify with a single command (curl, istioctl analyze), then layer on canary weights and circuit breakers.
For event-driven flows, the recurring production bug is a broken trace: a service consumes a Kafka message but never extracts and re-injects the trace context, so the trace dead-ends. When you ask AI to wire up consumers, make context propagation an explicit, tested requirement—not an afterthought.
Observability: Instrument Incrementally
Section titled “Observability: Instrument Incrementally”Modern observability has moved beyond dashboards to AI-driven anomaly detection and topology-aware root-cause analysis—but you still earn it one service at a time. The “instrument 8 services and 3 databases in one prompt” approach produces config you can’t validate. Instrument one service end to end, confirm a span shows up in Jaeger, then template it.
With the Grafana and Sentry MCP servers connected, you can close the loop without leaving your editor: ask the AI to pull the actual error rate or the slowest trace for a service and reason about it, instead of you screenshotting a dashboard.
Coordinating Changes Across Repos
Section titled “Coordinating Changes Across Repos”A feature like “loyalty points” touches Customer, Order, Payment, and Notification. The coordination problem—not the per-service code—is what makes this hard, and the three tools take genuinely different approaches.
Open all four service repos in a single multi-root workspace so the agent can see every contract at once. Design the OpenAPI/event contracts first, then use a background agent to implement each service in dependency order while you review diffs per repo. Cursor’s per-file checkpoints let you revert one service’s changes without unwinding the others. Best when you want to watch and steer each service’s diff visually.
Script the coordination. Claude Code runs headless, so you can drive each repo non-interactively and gate on contract tests:
for svc in customer order payment notification; do (cd "../$svc" && claude -p "Implement the loyalty-points changes per ../contracts/loyalty.openapi.yaml. Add Pact contract tests against the services you call. Stop if any contract test fails." \ --allowedTools Read Edit Bash)doneSub-agents and the -p headless mode make Claude Code the strongest fit when the change is mechanical across many repos and you want it auditable in CI.
Use Codex Cloud—one task per service, each in its own cloud environment—so changes are isolated and reviewable as separate units, then let Codex open the PRs. Its GitHub and Linear integrations mean you can drive the whole feature from an issue: link the Linear ticket, and Codex tracks the cross-repo work and reports status back. Best when the coordination should live in your issue tracker rather than a shell script.
When This Breaks
Section titled “When This Breaks”Distributed systems fail in ways a single-service mindset misses. Here are the failure modes that actually surface with AI-assisted work and how to recover.
-
Trace context dead-ends at an async boundary. A request shows up in Jaeger for two hops then vanishes. The consumer didn’t extract the trace context from message headers. Search the consumer for context extraction; if it’s missing, ask the AI to add header-based propagation and a test that asserts a known traceId survives the hop (see the Kafka prompt above). Don’t trust “I added tracing”—verify the traceId end to end.
-
A saga leaves orphaned state. Payment succeeded, inventory was never released after a downstream failure. The compensation is missing or non-idempotent. Reproduce by injecting a failure at the step after the one you suspect, and assert compensation fires exactly once. Rebuild that step with the failing-test-first prompt; never accept compensation logic without a test that triggers it.
-
MCP server auth fails or returns nothing. The tool connects but every query errors or returns empty. Usually a missing/expired token or wrong env var (
GRAFANA_SERVICE_ACCOUNT_TOKEN,DT_ENVIRONMENT, Sentry OAuth not completed). Runclaude mcp listto confirm the server is connected, re-check the env vars against the install commands above, and for the hosted Sentry server re-run the OAuth flow. Ifnpm view <pkg>shows a suspiciously low download count, you installed a look-alike—reinstall the official scoped package. -
The AI generated a “distributed monolith.” Services that must deploy together, or two services writing the same table. This is a design failure the model won’t flag on its own. Ask it to audit: “List every place two services share a database, a write path, or must deploy in lockstep.” Resolve those before splitting further—shared write paths defeat the point of microservices.
-
Canary auto-rollback never triggers. The deploy went bad but stayed at 100%. The rollback threshold references a metric that isn’t being emitted, or the metric name is wrong. Confirm the golden-signal metrics exist in Prometheus/Grafana (use the Grafana MCP to query them) before relying on automated rollback, and test the rollback path in staging with a deliberately failing build.
What’s Next
Section titled “What’s Next”- Monitoring and Observability — go deeper on OpenTelemetry, Grafana dashboards, and the Sentry MCP debugging loop
- Pipeline Automation with AI — change-aware builds and safe progressive deploys for the services you just split
- Infrastructure as Code with AI Assistants — Terraform, Pulumi, and the provider MCP servers that ground AI in real state
- Integration Test Patterns — contract testing and service-boundary verification without brittle mocks
- Must-Have MCP Servers for Every Developer — the foundational servers behind every workflow above