Skip to content

Infrastructure as Code Patterns

You inherit a Terraform repo with a single 900-line main.tf, hardcoded CIDRs, no remote backend, and a terraform apply that wants to replace the production RDS instance. An AI agent will happily generate another 900 lines on top of that if you let it. The skill that ships stable infrastructure is not “make the AI write HCL” — it is steering the AI to produce small, reviewable diffs, then reading every line of the plan before anything touches a real account.

This recipe shows the prompts and guardrails that turn Cursor, Claude Code, and Codex into a careful IaC pair, across Terraform, CloudFormation, and Pulumi.

  • A reusable prompt that refactors a monolithic Terraform file into modules with a remote, locked backend
  • A “plan review” prompt that makes the agent explain every destroy/replace before you apply
  • A drift-detection and import workflow you can run when the console and state disagree
  • AWS/Azure/Kubernetes MCP setup so the agent reads live resources instead of guessing
  • A recovery playbook for the three failures that actually page you: state locks, drift, and accidental blast radius

Before any prompt, give the agent two things: a rules file so it stops reinventing your conventions, and MCP access so it can read live cloud state instead of hallucinating resource names.

MCP setup is identical across all three tools — they all read the same server registry. The AWS suite is a set of purpose-specific Python servers (run via uvx), not a single Docker image:

Terminal window
# AWS — pick the specific awslabs server you need (Python, via uvx)
claude mcp add aws-api -- uvx awslabs.aws-api-mcp-server@latest
claude mcp add aws-core -- uvx awslabs.core-mcp-server@latest
# Pass credentials by profile, never inline keys:
# AWS_PROFILE=prod claude mcp add aws-api -- uvx awslabs.aws-api-mcp-server@latest
# Azure (npm)
claude mcp add azure -- npx -y @azure/mcp@latest
# Kubernetes (npm)
claude mcp add k8s -- npx -y kubernetes-mcp-server

In Cursor and Codex the same servers go in .mcp.json (or the IDE’s MCP settings) with identical command/args. With the AWS server connected, “list the RDS instances in this account and their engine versions” returns real data, so the agent’s refactor targets resources that exist.

For rules, skip the legacy .cursorrules file (Cursor has deprecated it). Use .cursor/rules/iac.mdc for Cursor and CLAUDE.md / AGENTS.md for Claude Code and Codex:

# .cursor/rules/iac.mdc (CLAUDE.md / AGENTS.md for Claude Code & Codex)
- Remote state only: S3 backend + DynamoDB lock table. Never local state.
- Pin providers (~> 5.0) and pin engine versions to a currently-supported release.
- Tag every resource with Environment and ManagedBy.
- No hardcoded CIDRs, regions, or secrets — use variables and a secrets manager.
- Propose a plan and wait for approval before any apply.

The dangerous moment in AI-assisted IaC is the gap between “the agent wrote HCL” and “I ran apply.” Close it with three steps: scope the refactor, force a plan review, then apply behind a human gate.

  1. Refactor the monolith into modules. Point the agent at the actual file and name the providers and backend explicitly, so you get a diff you can review module-by-module instead of a rewrite.

  2. Make the agent narrate the plan. Have it run terraform plan and explain every ~ (update-in-place), -/+ (replace), and - (destroy) line. Replacements on stateful resources (RDS, EBS) are where 3am incidents start.

  3. Apply behind a gate. Never let the agent apply autonomously to a real account. Run apply yourself, or gate it in CI behind a required review.

Here is the refactor step across all three tools. The prompt is the load-bearing part — the tools differ only in how you hand it the file and how the agent runs the plan.

In Agent mode, add the file to context and let it edit across the new module files with checkpoints so you can revert a bad split:

@main.tf Refactor this into modules: network (VPC, subnets, NAT),
data (RDS, ElastiCache), and compute (ASG, ALB). Move every hardcoded
CIDR and region into variables.tf with sensible defaults. Add an S3
backend with a DynamoDB lock table in backend.tf. Tag every resource
with Environment and ManagedBy. Show me the module boundaries before
writing files.

After it writes the files, ask Cursor to run terraform plan in the integrated terminal and summarize the changes. Use a checkpoint before accepting so a wrong subnet split is one click to undo.

The same loop works for CloudFormation (aws cloudformation deploy --no-execute-changeset to get a change set the agent explains) and Pulumi (pulumi preview instead of terraform plan). The tool changes; the “explain the diff before you touch the account” discipline does not.

The AWS MCP suite (awslabs/mcp) is the highest-leverage addition: the aws-api and cost-analysis servers let the agent read live resources and pricing, so drift detection and right-sizing are grounded in your real account instead of assumptions. The Azure and Kubernetes servers do the same for those platforms.

When you only need a single-purpose augmentation — say, an opinionated Terraform module-layout convention — an Agent Skill is lighter than a persistent MCP connection. Skills install with one universal command and work across Claude Code, Cursor, and Codex:

Terminal window
npx skills add <owner/repo>

Rule of thumb: reach for an MCP server when the agent needs a live, persistent connection (read the account, query pricing); reach for a Skill when you just want to inject reusable conventions or a checklist.

CloudFormation has the same failure shape from a different angle: a stack stuck in UPDATE_ROLLBACK_FAILED usually needs continue-update-rollback with the failing resource in --resources-to-skip. Ask the agent to read the stack events first and identify the resource that blocked the rollback before you touch it.