Docker and Kubernetes Containerization
Your image is 1.2GB, Trivy is flagging 40 CVEs in the base layer, and the pod just got OOMKilled with exit code 137 in staging. You could spend the afternoon reading Dockerfile best-practice blog posts and kubectl describe output, or you could put an AI agent in the loop with the actual files and let it generate, scan, and explain the fix while you review.
This article shows how to drive Docker and Kubernetes work with Cursor, Claude Code, and Codex: generating multi-stage Dockerfiles, hardening manifests, debugging crash loops, and connecting the real Docker and Kubernetes MCP servers so the agent can read live cluster state instead of guessing.
What You’ll Walk Away With
Section titled “What You’ll Walk Away With”- A repeatable prompt that turns a fat single-stage Dockerfile into a hardened, distroless multi-stage build under 80MB
- A debugging workflow for
exit 137OOMKills that correlates limits,kubectl describe, and recent code changes - The correct, non-hallucinated setup for the Docker MCP gateway and the
kubernetes-mcp-server(with the flags that actually exist) - A clear rule for when an Agent Skill beats a persistent MCP server for container work
- Three copy-paste prompts you can run today against your own repo
The Workflow: Generating a Hardened Dockerfile
Section titled “The Workflow: Generating a Hardened Dockerfile”The highest-leverage move is to hand the agent your current Dockerfile plus your real constraints (runtime, port, build tool) and ask for a multi-stage rewrite. The setup differs per tool, but the prompt is nearly identical.
In Cursor, attach the file as context with @Dockerfile and run this in Agent mode (do not type an @agent prefix — selecting Agent mode is enough; @ is reserved for context references like @Dockerfile or @package.json):
@Dockerfile @package.json Rewrite this as a multi-stage build for our Node 20service. Builder stage installs dev deps and runs the build; final stage isgcr.io/distroless/nodejs20-debian12, runs as UID 65534, copies only dist/ andproduction node_modules. Add a HEALTHCHECK hitting /health on 3000. Keep thefinal image under 80MB and explain each layer you cut.A persistent rule keeps every future Dockerfile consistent. Add .cursor/rules/containers.md:
---description: Container build standardsglobs: ["**/Dockerfile", "**/*.dockerfile"]---- Always use multi-stage builds; never ship build tooling in the final image.- Final stage: distroless or alpine, non-root USER, no shell unless required.- Pin base images by tag, never `latest`. Add a HEALTHCHECK.From the repo root, Claude Code reads the file off disk — no attachment step:
claude "Rewrite ./Dockerfile as a multi-stage build for our Node 20 service.Builder stage installs dev deps and runs the build; final stage isgcr.io/distroless/nodejs20-debian12, runs as UID 65534, copies only dist/ andproduction node_modules. Add a HEALTHCHECK on /health:3000, keep it under 80MB,and tell me which layers you removed and why."Codify the standard in CLAUDE.md at the repo root so it applies to every session:
## Containers- Multi-stage builds only; distroless/alpine final stage, non-root USER.- Pin base images by tag. Add HEALTHCHECK. Never COPY secrets into a layer.Codex (running GPT-5.5) works the same from the terminal. Keep it in a read-then-write sandbox so it can edit the Dockerfile but not touch your registry credentials:
codex --sandbox workspace-write --ask-for-approval on-request \ "Rewrite ./Dockerfile as a multi-stage build for our Node 20 service. Final stage gcr.io/distroless/nodejs20-debian12, UID 65534, dist/ + prod node_modules only, HEALTHCHECK on /health:3000, under 80MB. List the layers you cut and why."For the low-friction local loop, --full-auto is the shortcut for --sandbox workspace-write --ask-for-approval on-request.
The result should look roughly like this — the point is not the YAML, it’s that every line is justified and you reviewed it:
FROM node:20-alpine AS buildWORKDIR /appCOPY package*.json ./RUN npm ciCOPY . .RUN npm run build && npm prune --omit=dev
FROM gcr.io/distroless/nodejs20-debian12 AS productionWORKDIR /appCOPY --from=build /app/node_modules ./node_modulesCOPY --from=build /app/dist ./distUSER 65534:65534EXPOSE 3000HEALTHCHECK --interval=30s --timeout=5s --retries=3 CMD ["node", "dist/healthcheck.js"]CMD ["dist/server.js"]Generation is only half the loop. The style that separates a reviewer from a copy-paster is the critique pass — make the agent attack its own output before you trust it.
Hardening Kubernetes Manifests
Section titled “Hardening Kubernetes Manifests”The same generate-then-critique loop applies to manifests, but the failure mode is different: agents love to emit a 200-line Deployment with every field set, and you cannot review what you cannot read. Keep the prompt scoped to one resource and one concern at a time.
A focused security-context prompt that works across all three tools (the request is identical — only the invocation differs):
Asking for a diff rather than a full rewrite is the key trick: you see exactly the five lines that changed instead of re-reviewing a wall of YAML the agent reproduced from memory (and may have subtly altered).
Debugging an OOMKilled Pod
Section titled “Debugging an OOMKilled Pod”This is where AI assistance pays for itself. Exit code 137 means the kernel OOM-killed the process — but why takes correlating the limit, actual usage, and what changed. Feed the agent the evidence rather than asking it to speculate.
-
Collect the evidence into one place.
Terminal window kubectl describe pod -l app=api > /tmp/pod.txtkubectl top pod -l app=api >> /tmp/pod.txtgit log --oneline -10 >> /tmp/pod.txt -
Hand the bundle to the agent and ask for ranked hypotheses. In Claude Code:
claude "Read /tmp/pod.txt ..."; in Cursor, attach the file with@pod.txt; in Codex, pass the path in the prompt. The ask is identical across tools. -
Apply the smallest fix and verify. Usually a limit bump or a memory leak in the last deploy. Re-run
kubectl topafter the rollout and confirm the working set sits under the new limit with headroom.
Wiring Up the Docker and Kubernetes MCP Servers
Section titled “Wiring Up the Docker and Kubernetes MCP Servers”Prompts that paste files are fine, but the real upgrade is letting the agent query live state through MCP. Two servers matter here, and both are commonly hallucinated in AI-generated guides — here is the setup that actually works.
Docker MCP (the gateway model)
Section titled “Docker MCP (the gateway model)”There is no mcp/docker-toolkit image and no localhost:8080 HTTP endpoint to point a client at. The Docker MCP Toolkit is a Docker Desktop feature powered by the docker mcp CLI plugin (the MCP Gateway). You enable the MCP Toolkit in Docker Desktop, enable the servers you want from the catalog, then connect a client over stdio to the gateway process:
Add the gateway to .cursor/mcp.json:
{ "mcpServers": { "docker": { "command": "docker", "args": ["mcp", "gateway", "run"] } }}claude mcp add docker -- docker mcp gateway runAdd it to ~/.codex/config.toml:
[mcp_servers.docker]command = "docker"args = ["mcp", "gateway", "run"]Enable servers from the catalog first with docker mcp server enable <name>; the gateway exposes their tools to every connected client. See the Docker MCP Toolkit docs and docker/mcp-gateway.
Kubernetes MCP (real flags only)
Section titled “Kubernetes MCP (real flags only)”The kubernetes-mcp-server package (from containers/kubernetes-mcp-server) is real, but its flags are frequently invented. There is no --audit-log, --rbac-mode, --namespace-filter, or --context flag. The flags that exist are --kubeconfig, --read-only, --toolsets, --port, --disable-multi-cluster, and --config. RBAC is enforced by the ServiceAccount bound to the kubeconfig you point it at — not by a CLI switch.
Start read-only, which is the only sane default for a tool an LLM drives:
.cursor/mcp.json:
{ "mcpServers": { "kubernetes": { "command": "npx", "args": ["-y", "kubernetes-mcp-server@latest", "--read-only"], "env": { "KUBECONFIG": "/path/to/restricted-kubeconfig" } } }}# Safe default: read-only, scoped to a restricted kubeconfigclaude mcp add kubernetes \ --env KUBECONFIG=/path/to/restricted-kubeconfig \ -- npx -y kubernetes-mcp-server@latest --read-only
# Scope which tools are exposed instead of all of themclaude mcp add kubernetes -- npx -y kubernetes-mcp-server@latest \ --read-only --toolsets core,helm~/.codex/config.toml:
[mcp_servers.kubernetes]command = "npx"args = ["-y", "kubernetes-mcp-server@latest", "--read-only"]env = { KUBECONFIG = "/path/to/restricted-kubeconfig" }With the server connected, cluster questions become conversational — and grounded in real state:
Audit the production cluster: list pods that are not Running, any with noresource limits set, and any running as root. Group by namespace and flag thethree highest-risk findings.MCP server or Agent Skill?
Section titled “MCP server or Agent Skill?”Not every task needs a persistent connection. Agent Skills — installed with a universal CLI, npx skills add <owner/repo> (from vercel-labs/skills) and working across Claude Code, Cursor, and Codex — are the lighter-weight option for single-purpose, stateless augmentation: a Dockerfile-linting skill, a Helm-values generator, a deployment-checklist skill. Reach for a skill when you want repeatable knowledge or a one-shot transform; reach for an MCP server when the agent needs to read or act on live state (your running cluster, the Docker daemon). A Dockerfile-hardening skill plus the kubernetes-mcp-server for live reads is a common, complementary pairing.
When This Breaks
Section titled “When This Breaks”-
The agent emits a removed API. As above,
PodSecurityPolicy,extensions/v1beta1, andautoscaling/v2beta2still show up from stale training data. Pin it: tell the agent your cluster version and have it verify withkubectl api-resources. -
docker mcp gateway runexits immediately. The MCP Toolkit feature has to be enabled in Docker Desktop first, and you mustdocker mcp server enable <name>at least one catalog server. A gateway with nothing enabled has no tools to serve. -
The Kubernetes MCP server can see the cluster but every write fails. That is
--read-onlyand a scoped ServiceAccount doing their job. If you genuinely need a mutation, drop--read-onlydeliberately for that session — do not hand it admin. -
CI scan step uses a retired action. The old
github/codeql-action/upload-sarif@v2was retired in 2025; use@v3(or@v4). Have the agent grep your workflows for pinned action versions and bump only the SARIF upload step:- name: Upload Trivy resultsuses: github/codeql-action/upload-sarif@v3if: always()with:sarif_file: 'trivy-results.sarif' -
A “secure” devcontainer silently disables permission prompts. If you template a Claude Code devcontainer, the VS Code setting is
claudeCode.allowDangerouslySkipPermissions(withclaudeCode.initialPermissionModefor the default mode) — not aclaude-code.dangerouslySkipPermissionskey, which does nothing. And think twice before enabling bypass-permissions in a container you called “secure”: it contradicts the framing. Prefer the default prompting mode and an isolated, network-restricted devcontainer.
What’s Next
Section titled “What’s Next”- CI/CD Pipelines — wire these images into a build-scan-sign-deploy pipeline
- Infrastructure as Code — generate and review the Terraform/Helm around these workloads
- Monitoring and Observability — close the loop with metrics that catch the next OOMKill before staging does
- Incident Response — the live-debugging playbook when one of these pods pages you at 3am