Distributed Systems Development with AI

You change one field on the Order service’s API and three other services start returning 500s in staging. The trace is incomplete because two services never propagated the trace context, the saga that processes payments silently skipped its compensation step, and your on-call dashboard shows green while customers can’t check out. Distributed systems fail in the gaps between services—and that’s exactly where AI assistants are most useful and most dangerous: they generate plausible cross-service scaffolding fast, but pasting “generate the entire production system” gets you code you can’t verify.

This guide shows how to use Cursor, Claude Code, and Codex to do the parts AI is genuinely good at—drafting service skeletons, propagating trace context, writing the boring compensation logic—while keeping the verification loop tight enough that you’d ship the result.

What You’ll Walk Away With

A workflow for coordinating a single feature across multiple service repos in each of the three tools
Copy-paste prompts for designing service boundaries, writing a saga step with its compensation and a failing test, and instrumenting OpenTelemetry incrementally
The real, verified MCP servers for monitoring (Sentry, Grafana, Dynatrace) and infrastructure (Docker, Kubernetes, AWS)—with the exact install commands
A “When This Breaks” section covering the failure modes that actually bite: broken trace context, missing saga compensation, and MCP auth failures

MCP Servers That Actually Exist

Half the value of AI on distributed systems is letting it query live infrastructure instead of guessing. But the ecosystem is full of look-alike npm packages—sentry-mcp is a low-traffic stub, not Sentry’s server. Use these verified servers. MCP setup is identical across Cursor, Claude Code, and Codex: all three read the same server definitions (.mcp.json for Claude Code, .cursor/mcp.json for Cursor, ~/.codex/config.toml for Codex), so the commands below apply to whichever tool you drive.

Monitoring and observability

Sentry (errors, traces, releases) — use the official hosted server with OAuth, no token to manage:
Terminal window
```
claude mcp add --transport http sentry https://mcp.sentry.dev/mcp
```
For a self-hosted Sentry, the official npm package is @sentry/mcp-server:
Terminal window
```
claude mcp add sentry -- npx -y @sentry/mcp-server@latest --access-token=YOUR_TOKEN
```
Grafana (dashboards, Loki/Prometheus queries, incidents) — the official server is grafana/mcp-grafana, a Go binary distributed via Docker (there is no mcp-grafana npm package):
Terminal window
```
claude mcp add grafana -- docker run --rm -i \
  -e GRAFANA_URL=http://localhost:3000 \
  -e GRAFANA_SERVICE_ACCOUNT_TOKEN=YOUR_TOKEN \
  grafana/mcp-grafana -t stdio
```

Dynatrace (APM, AI anomaly detection) — the official package is published by the Dynatrace OSS org and needs Node 22.10+:

DT_ENVIRONMENT=https://YOUR.apps.dynatrace.com \
  claude mcp add dynatrace -- npx -y @dynatrace-oss/dynatrace-mcp-server@latest

Containers and infrastructure

Docker — the official MCP ships with Docker Desktop’s MCP Toolkit; you run the gateway rather than an npm package:
Terminal window
```
claude mcp add docker -- docker mcp gateway run
```
In Cursor, add a command-type server in Settings → MCP pointing at the same docker mcp gateway run.
Kubernetes — kubernetes-mcp-server is a real package; it uses your current kubeconfig context:
Terminal window
```
claude mcp add k8s -- npx -y kubernetes-mcp-server@latest
```
AWS — AWS Labs publishes purpose-specific servers (not one monolithic image). Pick the one you need and rely on the standard AWS credential chain rather than inlining keys:
Terminal window
```
claude mcp add aws-api -- uvx awslabs.aws-api-mcp-server@latest
```
Browse the full catalog at awslabs.github.io/mcp. For Google Cloud, deploy a custom MCP server on Cloud Run—see cloud.google.com/run/docs.

Designing Service Boundaries

AI drafts bounded-context proposals quickly, but the boundaries are a business decision—treat the output as a first draft to argue with, not a verdict. Start narrow: ask for boundaries plus the reasoning, so you can spot where the model conflated a technical layer with a domain.

When you’ve agreed on boundaries, design one service at a time. Resist “generate all services”—you can’t review a dump of seven services, and the contracts between them are where bugs hide.

Saga Patterns You Can Actually Verify

The saga pattern is where AI-generated distributed code most often looks right and is wrong. The failure mode is always the same: the happy path is fine, but a compensation step is non-idempotent, or a timeout budget is missing. The fix is to build one step at a time, each with its compensation and a failing test first, then watch it go green.

Open the Order service repo and switch to Agent mode. Ask for one saga step plus a failing test, run the test in Cursor’s terminal, and only accept the diff once it’s green. Use a checkpoint before each step so you can roll back a bad compensation without losing the prior steps. Cursor’s inline diff view makes it easy to spot when the model “fixed” the test by weakening the assertion instead of the code.

Drive it from the terminal so the test run is part of the loop. Claude Code can run the test, read the failure, and iterate without you copy-pasting output:

claude "Implement step 3 of the order saga (process payment) plus its
compensation (refund). Write a failing test that asserts the refund fires
when step 4 throws, then make it pass. Run the test with `npm test -- saga`
and show me the diff before committing."

Add a hook in .claude/settings.json that runs the saga test suite on every edit to saga/, so a regression in an earlier step surfaces immediately.

Use a dedicated git worktree so the saga work is isolated in its own local checkout, then have Codex run the suite per step. ChatGPT desktop can create an optional managed worktree; CLI and IDE tasks use a checkout you select:

codex --sandbox workspace-write -c approval_policy=on-request "Implement the payment step of the order
saga with an idempotent compensation. Add a test that injects a failure at
the inventory step and asserts payment is refunded exactly once on retry.
Run the suite and stop for my review before applying."

Tie every step to an observable check: after the model claims a step works, run the one test that proves the compensation fires. If you can’t articulate the test, you can’t trust the code.

Inter-Service Communication and Trace Context

Service mesh and gateway configs are high-leverage for AI—but again, incrementally. Start with the smallest config that you can verify with a single command (curl, istioctl analyze), then layer on canary weights and circuit breakers.

For event-driven flows, the recurring production bug is a broken trace: a service consumes a Kafka message but never extracts and re-injects the trace context, so the trace dead-ends. When you ask AI to wire up consumers, make context propagation an explicit, tested requirement—not an afterthought.

Observability: Instrument Incrementally

Modern observability has moved beyond dashboards to AI-driven anomaly detection and topology-aware root-cause analysis—but you still earn it one service at a time. The “instrument 8 services and 3 databases in one prompt” approach produces config you can’t validate. Instrument one service end to end, confirm a span shows up in Jaeger, then template it.

With the Grafana and Sentry MCP servers connected, you can close the loop without leaving your editor: ask the AI to pull the actual error rate or the slowest trace for a service and reason about it, instead of you screenshotting a dashboard.

Coordinating Changes Across Repos

A feature like “loyalty points” touches Customer, Order, Payment, and Notification. The coordination problem—not the per-service code—is what makes this hard, and the three tools take genuinely different approaches.

Open all four service repos in a single multi-root workspace so the agent can see every contract at once. Design the OpenAPI/event contracts first, then use a background agent to implement each service in dependency order while you review diffs per repo. Cursor’s per-file checkpoints let you revert one service’s changes without unwinding the others. Best when you want to watch and steer each service’s diff visually.

Script the coordination. Claude Code runs headless, so you can drive each repo non-interactively and gate on contract tests:

for svc in customer order payment notification; do
  (cd "../$svc" && claude -p "Implement the loyalty-points changes per
    ../contracts/loyalty.openapi.yaml. Add Pact contract tests against the
    services you call. Stop if any contract test fails." \
    --allowedTools Read Edit Bash)
done

Sub-agents and the -p headless mode make Claude Code the strongest fit when the change is mechanical across many repos and you want it auditable in CI.

When This Breaks

Distributed systems fail in ways a single-service mindset misses. Here are the failure modes that actually surface with AI-assisted work and how to recover.

Trace context dead-ends at an async boundary. A request shows up in Jaeger for two hops then vanishes. The consumer didn’t extract the trace context from message headers. Search the consumer for context extraction; if it’s missing, ask the AI to add header-based propagation and a test that asserts a known traceId survives the hop (see the Kafka prompt above). Don’t trust “I added tracing”—verify the traceId end to end.
A saga leaves orphaned state. Payment succeeded, inventory was never released after a downstream failure. The compensation is missing or non-idempotent. Reproduce by injecting a failure at the step after the one you suspect, and assert compensation fires exactly once. Rebuild that step with the failing-test-first prompt; never accept compensation logic without a test that triggers it.
MCP server auth fails or returns nothing. The tool connects but every query errors or returns empty. Usually a missing/expired token or wrong env var (GRAFANA_SERVICE_ACCOUNT_TOKEN, DT_ENVIRONMENT, Sentry OAuth not completed). Run claude mcp list to confirm the server is connected, re-check the env vars against the install commands above, and for the hosted Sentry server re-run the OAuth flow. If npm view <pkg> shows a suspiciously low download count, you installed a look-alike—reinstall the official scoped package.
The AI generated a “distributed monolith.” Services that must deploy together, or two services writing the same table. This is a design failure the model won’t flag on its own. Ask it to audit: “List every place two services share a database, a write path, or must deploy in lockstep.” Resolve those before splitting further—shared write paths defeat the point of microservices.
Canary auto-rollback never triggers. The deploy went bad but stayed at 100%. The rollback threshold references a metric that isn’t being emitted, or the metric name is wrong. Confirm the golden-signal metrics exist in Prometheus/Grafana (use the Grafana MCP to query them) before relying on automated rollback, and test the rollback path in staging with a deliberately failing build.

What’s Next

Monitoring and Observability — go deeper on OpenTelemetry, Grafana dashboards, and the Sentry MCP debugging loop
Pipeline Automation with AI — change-aware builds and safe progressive deploys for the services you just split
Infrastructure as Code with AI Assistants — Terraform, Pulumi, and the provider MCP servers that ground AI in real state
Integration Test Patterns — contract testing and service-boundary verification without brittle mocks
Must-Have MCP Servers for Every Developer — the foundational servers behind every workflow above