Infrastructure as Code with AI Assistants

You inherit a 2,000-line Terraform module with no docs, the previous platform engineer has left, and you have two days to add a multi-region read replica without taking down production. Reading the AWS provider docs tab-by-tab while reconstructing the author’s intent is exactly the slow, error-prone work AI assistants are good at compressing—if you ground them in real provider schemas instead of letting them guess from stale training data.

This guide shows the workflows that hold up in production: conversational design, MCP-grounded generation, security review, and drift remediation across Terraform, CloudFormation, Pulumi, and CDK—using Cursor, Claude Code, and Codex.

What You’ll Walk Away With

A repeatable pattern for framing infrastructure requests as business constraints, not resource lists
MCP setup for the HashiCorp Terraform, Pulumi, and AWS servers in all three tools, so suggestions are grounded in live registry/account data
Three copy-paste prompts you can use today: a security-framework review, a drift-detection sweep, and a cost-governance generator
A “When This Breaks” checklist for the failure modes that actually bite (state locks, drift, provider-pin breakage, MCP auth)

Frame Infrastructure as Constraints, Not Resources

The single biggest lever on output quality is the opening prompt. Engineers who get generic configs ask for generic things (“create an EKS cluster with 3 nodes”). Engineers who get production-ready configs describe the business and let the model derive the topology.

The “three riskiest assumptions” clause is what separates a usable answer from a wall of HCL. It forces the model to surface where it’s guessing (Is Atlas in the same region? Is the SLA per-region or global?) so you correct course before any code exists. This is identical across Cursor, Claude Code, and Codex—the discipline is in the prompt, not the tool.

Ground the AI in Real Provider Data with MCP

Models hallucinate resource arguments and lag behind provider releases. MCP servers fix this by giving the assistant live access to the Terraform Registry, the Pulumi Registry, and your AWS account. Setup is the same conceptually in every tool—register the server, then prompt normally—but the registration command differs.

HashiCorp Terraform MCP Server

This is HashiCorp’s official server (terraform-mcp-server), providing Registry lookups, provider/module discovery, and HCP Terraform workspace management.

Add to .cursor/mcp.json in your project (or the global config via Settings, MCP):

{
  "mcpServers": {
    "terraform": {
      "command": "npx",
      "args": ["-y", "terraform-mcp-server"]
    }
  }
}

Then in Agent mode: “List the current aws_db_instance arguments for Postgres and flag any that are deprecated in the latest provider.”

claude mcp add terraform -- npx -y terraform-mcp-server

# Confirm it registered
claude mcp list

Then prompt: “Using the Terraform MCP server, show the latest hashicorp/aws provider version and the recommended arguments for an Aurora Postgres cluster with read replicas.”

codex mcp add terraform -- npx -y terraform-mcp-server

Or add it to config.toml directly:

[mcp_servers.terraform]
command = "npx"
args = ["-y", "terraform-mcp-server"]

Then run codex and ask it to look up current provider schemas before generating HCL.

AWS MCP Servers

AWS Labs publishes several servers. Use the right one and avoid the deprecated ones:

awslabs.cfn-mcp-server — CloudFormation and direct resource management via the Cloud Control API. Current and maintained.
awslabs.aws-iac-mcp-server — CloudFormation template validation, compliance checks, and deployment troubleshooting. This is the consolidated successor to the now-deprecated CDK server.

These ship as Python packages, so they run with uvx, not npx:

{
  "mcpServers": {
    "aws-cfn": {
      "command": "uvx",
      "args": ["awslabs.cfn-mcp-server@latest"]
    },
    "aws-iac": {
      "command": "uvx",
      "args": ["awslabs.aws-iac-mcp-server@latest"]
    }
  }
}

Credentials come from your standard AWS profile chain; set AWS_PROFILE in the server env block if you use named profiles.

# CloudFormation / Cloud Control API
claude mcp add aws-cfn -- uvx awslabs.cfn-mcp-server@latest

# IaC validation + compliance (CDK/CFN)
claude mcp add aws-iac -- uvx awslabs.aws-iac-mcp-server@latest

Pass an AWS profile with claude mcp add aws-cfn --env AWS_PROFILE=prod -- uvx awslabs.cfn-mcp-server@latest.

codex mcp add aws-cfn -- uvx awslabs.cfn-mcp-server@latest
codex mcp add aws-iac -- uvx awslabs.aws-iac-mcp-server@latest

Use --env AWS_PROFILE=prod to scope to a profile. Codex stores these under [mcp_servers.*] in ~/.codex/config.toml.

Pulumi MCP Server

For programming-language infrastructure, the Pulumi server (@pulumi/mcp-server) runs pulumi preview/up, retrieves stack outputs, and reads the Pulumi Registry.

{
  "mcpServers": {
    "pulumi": {
      "command": "npx",
      "args": ["@pulumi/mcp-server@latest", "stdio"]
    }
  }
}

claude mcp add pulumi -- npx @pulumi/mcp-server@latest stdio

codex mcp add pulumi -- npx @pulumi/mcp-server@latest stdio

Generate Production Terraform, Then Read It Critically

With the Terraform MCP server connected, generation grounds itself in current schemas. Ask for the latest provider versions explicitly—then verify the result rather than trusting it.

A grounded request produces a root config like this:

terraform {
  required_version = ">= 1.9"
  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = "~> 6.0"
    }
  }

  backend "s3" {
    bucket         = "acme-tf-state-prod"
    key            = "platform/terraform.tfstate"
    region         = "us-east-1"
    encrypt        = true
    use_lockfile   = true
  }
}

Two things to verify before you trust any AI-generated Terraform, because models routinely get them wrong:

Provider pin matches reality. The AWS provider is on the 6.x line (6.49.0 at the time of writing). If the model emits ~> 5.0 while claiming it’s “the latest,” that’s a stale-training tell—correct it to ~> 6.0 and re-run terraform init -upgrade.
State locking is current. Modern Terraform locks S3 state with use_lockfile = true; the old dynamodb_table lock is no longer required. If you see a DynamoDB lock table generated for a new project, ask why.

When the model returns that table, expect it to over-flag (e.g., insisting an internal ALB needs WAF). Treat it as a senior reviewer’s first pass: accept the encryption-at-rest and least-privilege findings, push back on the ones that don’t fit your threat model, and ask it to justify any “Critical” you disagree with.

Detect and Remediate Drift

Drift—someone clicking in the console, a hotfix that never made it back to code—is where IaC quietly rots. With the AWS MCP server connected, the assistant can read actual resource state and compare it to what your code declares.

A representative slice of what comes back:

URGENT  db-cluster-prod / SecurityGroupIngress
  expected: 10.0.0.0/16 on 5432
  current:  0.0.0.0/0 on 5432   <- opened manually 3 days ago
  risk:     Postgres exposed to the internet
  fix:      revert in console now, then re-apply stack to lock it

Before you act on it, verify the two things AI drift reports get wrong: that the “expected” value matches your current code (not an old plan), and that the manual change wasn’t a deliberate, undocumented break-glass fix. Confirm with whoever owns the stack, then let the assistant generate the corrective change set.

Cost Governance as Code

Cost optimization is multi-dimensional—performance, reliability, and spend traded against each other—which is exactly the kind of reasoning to delegate, then audit.

For a batch workload, give the model the shape of the job and let it pick the cost-optimal topology:

Workload: 100GB/day batch, tolerates 4h delay, needs 16 vCPU / 64GB while running,
only 08:00-18:00 EST. Target under $500/month. Recommend compute (Spot vs. on-demand
vs. Fargate), storage tiering, and the scheduler. Show the trade-off you made to hit budget.

A good answer reaches for Spot with an on-demand fallback, a scheduled scale-to-zero outside business hours, and S3 Intelligent-Tiering—then names the trade-off (Spot interruptions add latency variance within the 4h SLA budget). If it silently picks always-on on-demand, push back: “Why not Spot, given the 4-hour tolerance?”

Pulumi and CDK: When Code Beats Templates

For teams that prefer real programming languages over HCL or YAML, Pulumi and CDK let the assistant generate typed, testable infrastructure.

Pulumi (TypeScript)
AWS CDK (TypeScript)

Ask: “Using Pulumi TypeScript, create a ComponentResource for a microservice with a configurable replica count, CPU/memory, and an ingress path. Default to 1 replica in non-prod and 3 in prod.” A grounded result looks like:

import * as pulumi from '@pulumi/pulumi';

interface ServiceArgs {
  environment: string;
  replicas: number;
  cpu: string;
  memory: string;
}

class MicroserviceStack extends pulumi.ComponentResource {
  constructor(name: string, args: ServiceArgs) {
    super('acme:infra:MicroserviceStack', name, {}, {});
    // ...service, ingress, and monitoring resources, parented to this
    this.registerOutputs();
  }
}

for (const svc of ['user-service', 'order-service', 'payment-service']) {
  new MicroserviceStack(svc, {
    environment: env,
    replicas: env === 'production' ? 3 : 1,
    cpu: '500m',
    memory: '1Gi',
  });
}

Then: “Generate @pulumi/policy unit tests asserting that every service in production has at least 2 replicas and a CPU limit set.”

Ask: “Generate a CDK stack for a 3-AZ VPC with public/private/isolated subnets and an Aurora Postgres cluster in the isolated tier. Use a current engine version and snapshot on removal.” A grounded result looks like:

import * as cdk from 'aws-cdk-lib';
import { Construct } from 'constructs';
import * as ec2 from 'aws-cdk-lib/aws-ec2';
import * as rds from 'aws-cdk-lib/aws-rds';

export class PlatformStack extends cdk.Stack {
  constructor(scope: Construct, id: string, props?: cdk.StackProps) {
    super(scope, id, props);

    const vpc = new ec2.Vpc(this, 'PlatformVpc', { maxAzs: 3, natGateways: 2 });

    new rds.DatabaseCluster(this, 'Db', {
      engine: rds.DatabaseClusterEngine.auroraPostgres({
        version: rds.AuroraPostgresEngineVersion.VER_16_4,
      }),
      vpc,
      vpcSubnets: { subnetType: ec2.SubnetType.PRIVATE_ISOLATED },
      removalPolicy: cdk.RemovalPolicy.SNAPSHOT,
    });
  }
}

Run cdk synth and ask the assistant to add CDK Nag (cdk-nag) so security findings surface at synth time rather than in review.

Whichever you choose, pin the engine version against reality (Aurora Postgres 16.x is current and has the longest support window) and don’t ship the first synth—run the synthesized template through awslabs.aws-iac-mcp-server for a validation pass.

When This Breaks

Real IaC failures cluster into a few patterns. Recognize them fast:

State lock contention. terraform apply hangs on “Acquiring state lock.” Usually a CI run and a human ran at once, or a previous run crashed mid-apply. Prompt: “A terraform apply is blocked on a state lock. Check whether the lock is stale (look at the lock holder and timestamp), tell me whether force-unlock <ID> is safe, and recommend a CI mutex so this stops happening.” Never force-unlock blindly—you can corrupt state if an apply is genuinely still running.
Drift after a manual console edit. Your plan shows changes you didn’t make. Someone fixed something by hand. Run the drift prompt above, then decide per-resource: import the change or revert it. Don’t blanket-revert—you may erase a real fix.
Provider-pin breakage. terraform init fails or a plan explodes after the model wrote ~> 6.0 and your lockfile pins 5.x (or vice versa). Regenerate the lockfile with terraform providers lock, and never let the AI bump a major provider version without reading that provider’s upgrade guide first.
MCP server returns nothing useful. The AWS server can’t see your account, or the Terraform server returns stale data. Almost always auth or region: confirm AWS_PROFILE/AWS_REGION are set in the server’s env, that the profile has read permissions, and that you registered the right package (not a deprecated one). Re-run claude mcp list / codex mcp list to confirm the server is actually connected.
Hallucinated resources or arguments. Without MCP grounding, models invent plausible-but-fake arguments. If terraform validate rejects a field, don’t argue with the model—reconnect the Terraform MCP server and ask it to confirm the argument against the live provider schema.

What’s Next

CI/CD Pipelines with AI Wire the Terraform and CloudFormation workflows here into automated, gated deployment pipelines.

Incident Response When an infrastructure change causes an incident, drive the investigation with AI and live telemetry.

MCP Ecosystem Go further on configuring, scoping, and securing MCP servers across Cursor, Claude Code, and Codex.

Security Operations Apply the framework-based security review here as continuous cloud security operations.