Infrastructure as Code Patterns

You inherit a Terraform repo with a single 900-line main.tf, hardcoded CIDRs, no remote backend, and a terraform apply that wants to replace the production RDS instance. An AI agent will happily generate another 900 lines on top of that if you let it. The skill that ships stable infrastructure is not “make the AI write HCL” — it is steering the AI to produce small, reviewable diffs, then reading every line of the plan before anything touches a real account.

This recipe shows the prompts and guardrails that turn Cursor, Claude Code, and Codex into a careful IaC pair, across Terraform, CloudFormation, and Pulumi.

What You’ll Walk Away With

A reusable prompt that refactors a monolithic Terraform file into modules with a remote, locked backend
A “plan review” prompt that makes the agent explain every destroy/replace before you apply
A drift-detection and import workflow you can run when the console and state disagree
AWS/Azure/Kubernetes MCP setup so the agent reads live resources instead of guessing
A recovery playbook for the three failures that actually page you: state locks, drift, and accidental blast radius

Set Up Rules and MCP Servers First

Before any prompt, give the agent two things: a rules file so it stops reinventing your conventions, and MCP access so it can read live cloud state instead of hallucinating resource names.

MCP setup is identical across all three tools — they all read the same server registry. The AWS suite is a set of purpose-specific Python servers (run via uvx), not a single Docker image:

# AWS — pick the specific awslabs server you need (Python, via uvx)
claude mcp add aws-api -- uvx awslabs.aws-api-mcp-server@latest
claude mcp add aws-core -- uvx awslabs.core-mcp-server@latest
# Pass credentials by profile, never inline keys:
#   AWS_PROFILE=prod claude mcp add aws-api -- uvx awslabs.aws-api-mcp-server@latest

# Azure (npm)
claude mcp add azure -- npx -y @azure/mcp@latest

# Kubernetes (npm)
claude mcp add k8s -- npx -y kubernetes-mcp-server

In Cursor and Codex the same servers go in .mcp.json (or the IDE’s MCP settings) with identical command/args. With the AWS server connected, “list the RDS instances in this account and their engine versions” returns real data, so the agent’s refactor targets resources that exist.

For rules, skip the legacy .cursorrules file (Cursor has deprecated it). Use .cursor/rules/iac.mdc for Cursor and CLAUDE.md / AGENTS.md for Claude Code and Codex:

# .cursor/rules/iac.mdc  (CLAUDE.md / AGENTS.md for Claude Code & Codex)
- Remote state only: S3 backend + DynamoDB lock table. Never local state.
- Pin providers (~> 5.0) and pin engine versions to a currently-supported release.
- Tag every resource with Environment and ManagedBy.
- No hardcoded CIDRs, regions, or secrets — use variables and a secrets manager.
- Propose a plan and wait for approval before any apply.

The Workflow: Refactor, Review, Apply

The dangerous moment in AI-assisted IaC is the gap between “the agent wrote HCL” and “I ran apply.” Close it with three steps: scope the refactor, force a plan review, then apply behind a human gate.

Refactor the monolith into modules. Point the agent at the actual file and name the providers and backend explicitly, so you get a diff you can review module-by-module instead of a rewrite.
Make the agent narrate the plan. Have it run terraform plan and explain every ~ (update-in-place), -/+ (replace), and - (destroy) line. Replacements on stateful resources (RDS, EBS) are where 3am incidents start.
Apply behind a gate. Never let the agent apply autonomously to a real account. Run apply yourself, or gate it in CI behind a required review.

Here is the refactor step across all three tools. The prompt is the load-bearing part — the tools differ only in how you hand it the file and how the agent runs the plan.

In Agent mode, add the file to context and let it edit across the new module files with checkpoints so you can revert a bad split:

@main.tf Refactor this into modules: network (VPC, subnets, NAT),
data (RDS, ElastiCache), and compute (ASG, ALB). Move every hardcoded
CIDR and region into variables.tf with sensible defaults. Add an S3
backend with a DynamoDB lock table in backend.tf. Tag every resource
with Environment and ManagedBy. Show me the module boundaries before
writing files.

After it writes the files, ask Cursor to run terraform plan in the integrated terminal and summarize the changes. Use a checkpoint before accepting so a wrong subnet split is one click to undo.

Claude Code edits the files and runs the plan in one terminal session, which is ideal for the review loop:

claude "Refactor main.tf into network, data, and compute modules.
Move hardcoded CIDRs and regions into variables. Add an S3 backend
with DynamoDB state locking. Then run 'terraform plan' and walk me
through every resource that will be replaced or destroyed — I have
not approved an apply."

Because it runs plan itself and reads the output, it will flag, for example, that changing the RDS engine_version forces a replacement — before you find out the hard way.

Use the Codex CLI with an approval gate so it cannot apply without you, or hand the refactor to Codex Cloud on a worktree for a reviewable PR:

codex --sandbox workspace-write -c approval_policy=on-request \
  "Split main.tf into network/data/compute modules, extract CIDRs and
   regions into variables, add an S3 + DynamoDB locked backend, then run
   terraform plan and explain every replace/destroy. Do not apply."

For larger repos, kick it off in Codex Cloud against a branch worktree — it produces the module split as a pull request you review like any other, keeping the apply behind your existing branch protections.

The same loop works for CloudFormation (aws cloudformation deploy --no-execute-changeset to get a change set the agent explains) and Pulumi (pulumi preview instead of terraform plan). The tool changes; the “explain the diff before you touch the account” discipline does not.

Copy-Paste Prompts

MCP and Skills That Change This Workflow

The AWS MCP suite (awslabs/mcp) is the highest-leverage addition: the aws-api and cost-analysis servers let the agent read live resources and pricing, so drift detection and right-sizing are grounded in your real account instead of assumptions. The Azure and Kubernetes servers do the same for those platforms.

When you only need a single-purpose augmentation — say, an opinionated Terraform module-layout convention — an Agent Skill is lighter than a persistent MCP connection. Skills install with one universal command and work across Claude Code, Cursor, and Codex:

npx skills add <owner/repo>

Rule of thumb: reach for an MCP server when the agent needs a live, persistent connection (read the account, query pricing); reach for a Skill when you just want to inject reusable conventions or a checklist.

When This Breaks

CloudFormation has the same failure shape from a different angle: a stack stuck in UPDATE_ROLLBACK_FAILED usually needs continue-update-rollback with the failing resource in --resources-to-skip. Ask the agent to read the stack events first and identify the resource that blocked the rollback before you touch it.

What’s Next

CI/CD Pipelines Gate terraform apply behind a reviewed pipeline instead of running it from your laptop.

Kubernetes Patterns Once the cluster exists, drive workloads and GitOps with the same review discipline.

Docker Patterns Build the images your IaC deploys, with AI help on multi-stage builds and slimming.

Monitoring & Logging Wire observability into the infrastructure you just provisioned.