Skip to content

Database Migration Patterns

You shipped an innocent-looking ALTER TABLE products ALTER COLUMN price TYPE numeric at 2pm. Postgres took an ACCESS EXCLUSIVE lock to rewrite a 40-million-row table, every checkout query queued behind it, and the storefront started returning 500s for nine minutes while your PM watched the error graph climb. The migration was “correct” — it just wasn’t safe.

The fix isn’t to write SQL more carefully by hand. It’s to let an AI agent generate the migration, then make it review its own work against the exact hazards that cause incidents: long locks, unbatched backfills, and indexes built without CONCURRENTLY. This recipe shows the workflow across Cursor, Claude Code, and Codex.

  • A reusable expand-contract prompt that turns a risky one-shot ALTER into three safe, numbered migrations plus the dual-write application code
  • A migration safety-review prompt that flags ACCESS EXCLUSIVE locks, missing CONCURRENTLY, and unbatched UPDATEs before you run anything
  • A read-only Postgres MCP server wired in so the agent reads your live schema instead of guessing column types
  • A rollback recipe and a “When This Breaks” runbook for lock contention, half-finished backfills, and partial migration state

The single most useful pattern for zero-downtime change is expand-contract (a.k.a. parallel change): never mutate a column in place. Add the new shape, dual-write to both, backfill in batches, cut reads over, then drop the old shape in a later deploy. Each step is independently deployable and reversible.

The change-the-column-type incident above becomes three migrations instead of one:

-- 001_expand_add_price_cents.sql (deploy 1: add nullable column, no rewrite, no lock)
ALTER TABLE products ADD COLUMN price_cents integer;
-- 002_backfill_price_cents.sql (deploy 1+: run OUTSIDE a txn, in batches)
-- Repeat until 0 rows are updated. Each statement is a short, separate transaction.
UPDATE products
SET price_cents = round(price * 100)
WHERE price_cents IS NULL
AND id IN (
SELECT id FROM products WHERE price_cents IS NULL ORDER BY id LIMIT 1000
);
-- 003_contract_drop_price.sql (deploy 3, days later: only after dual-write is live)
ALTER TABLE products ALTER COLUMN price_cents SET NOT NULL;
ALTER TABLE products DROP COLUMN price;

Between deploy 1 and deploy 3 the application dual-writes and reads the new column with a fallback, so old and new code versions coexist safely:

// Dual-write while both columns exist (transition phase)
async function setProductPrice(productId, priceCents) {
await db.query(
`UPDATE products
SET price_cents = $1, price = $1 / 100.0
WHERE id = $2`,
[priceCents, productId]
);
}
// Read the new column, fall back to the old one until backfill completes
async function getProductPriceCents(productId) {
const { rows } = await db.query(
'SELECT price, price_cents FROM products WHERE id = $1',
[productId]
);
return rows[0].price_cents ?? Math.round(rows[0].price * 100);
}

You do not have to author this from memory. Hand the agent your real schema and let it produce the whole sequence.

Let the Agent See the Real Schema (Postgres MCP)

Section titled “Let the Agent See the Real Schema (Postgres MCP)”

The prompt above works, but you had to paste the schema by hand — and if you get a column type wrong, the agent generates a migration against a table that doesn’t exist. A Postgres MCP server closes that gap: it gives the agent a connection so it can inspect live table definitions, row counts, and existing indexes before writing a single line. Use a maintained server — Postgres MCP Pro (crystaldba/postgres-mcp) is the actively supported choice, and its --access-mode=restricted flag enforces read-only access at the server level. (Avoid the old @modelcontextprotocol/server-postgres package: it’s deprecated and archived, and carries a known SQL-injection issue that lets a stacked statement escape its read-only transaction.)

Without MCP: you copy \d products output into the prompt and hope it’s current. With MCP: the agent runs the equivalent of SELECT … FROM information_schema.columns itself, sees that price is numeric(10,2) and the table has 40M rows, and tailors the batch size and lock strategy accordingly.

Postgres MCP Pro is a Python tool — install it with pipx install postgres-mcp (or uv pip install postgres-mcp). Setup is then identical across all three tools, since they all speak MCP. Add the server to your MCP config, passing the connection string via DATABASE_URI and forcing read-only with --access-mode=restricted:

{
"mcpServers": {
"postgres": {
"command": "postgres-mcp",
"args": ["--access-mode=restricted"],
"env": {
"DATABASE_URI": "postgresql://readonly@localhost:5432/mydb"
}
}
}
}

For Claude Code you can add the same server from the CLI:

Terminal window
claude mcp add --transport stdio \
--env DATABASE_URI=postgresql://readonly@localhost:5432/mydb \
postgres -- postgres-mcp --access-mode=restricted

If all you need is “let the agent run safe read-only queries” without a persistent connection, a lighter-weight alternative is an Agent Skill such as supabase/postgres-best-practices, installed with npx skills add supabase/agent-skills. Skills are single-purpose augmentation; the MCP server is the better fit when you want the agent to repeatedly introspect a live database across a whole session.

The migration files and dual-write code are produced by the same prompt everywhere — but how you run that loop, review the diff, and ship it differs.

Open the migrations directory and switch the model picker to Claude Opus 4.8 (or Claude Fable 5 for the most complex, multi-file refactors where budget matters less than correctness). In Agent mode, paste the expand-contract prompt above. Cursor proposes the three SQL files plus the dual-write changes as a multi-file edit; review them inline in the diff view before accepting.

The visual diff is the point here: read the WHERE … LIMIT 1000 batch clause and confirm the contract migration is in a separate file from expand, so you can deploy them in different releases. Use a checkpoint before accepting so you can roll the edit back if the agent over-reaches. Then run the migration through your normal tool (npm run db:migrate) in the integrated terminal and watch for lock waits.

Generating the migration is half the job. The other half is catching the production hazards a human reviewer skims past at 2pm. The two failure modes from the opening incident — full-table locks and unbatched writes — are exactly what an LLM is good at spotting when you ask it to look.

A few corrections this review reliably surfaces:

  • Index builds: CREATE INDEX CONCURRENTLY idx_products_sku ON products(sku); instead of a bare CREATE INDEX. The concurrent form avoids the ACCESS EXCLUSIVE lock — at the cost that it cannot run inside a transaction block, so it must be its own migration step.
  • New constraints: add as NOT VALID, then ALTER TABLE … VALIDATE CONSTRAINT in a separate statement. Validation then takes only a SHARE UPDATE EXCLUSIVE lock and doesn’t block writes.
  • Backfills: batched in a loop (as in the expand-contract files above), never one giant UPDATE that holds row locks and bloats the table with dead tuples.

If you wrap online schema changes yourself, prefer a battle-tested tool over hand-rolled triggers. For MySQL, pt-online-schema-change drives the copy-and-swap with triggers, getting the explicit column lists right where hand-written INSERT … SELECT NEW.* triggers go wrong (MySQL NEW is a row reference, not a table you can SELECT from). gh-ost takes the opposite approach — it’s triggerless, tailing the binary log to replay writes onto the ghost table, which keeps trigger overhead off your source under heavy write traffic. Ask the agent to invoke one of these rather than reinventing it.

A Validated Data Migration You Can Actually Run

Section titled “A Validated Data Migration You Can Actually Run”

When you’re reshaping data — not just schema — pair the migration with a validating, batched copy. Here is a short, self-contained version centered on the loop the agent generates; keep it small and let the prompt produce the table-specific transform/validate:

// Batched, validated copy from a legacy table. db is a node-postgres-style client.
async function migrateUsers({ batchSize = 1000 } = {}) {
let cursor = 0;
let migrated = 0;
const errors = [];
for (;;) {
const { rows: batch } = await db.query(
'SELECT * FROM legacy_users WHERE id > $1 ORDER BY id LIMIT $2',
[cursor, batchSize]
);
if (batch.length === 0) break;
let successful = 0; // let, not const — this gets incremented below
for (const row of batch) {
const email = row.email_address?.toLowerCase();
if (!email || !email.includes('@')) {
errors.push({ id: row.id, reason: 'invalid email' });
continue;
}
await db.query(
'INSERT INTO users (email, legacy_id) VALUES ($1, $2) ON CONFLICT (email) DO NOTHING',
[email, row.id]
);
successful++;
}
migrated += successful;
cursor = batch[batch.length - 1].id;
console.info(`migrated ${migrated} (batch ${batch.length}, ${errors.length} skipped)`);
}
return { migrated, errors, ok: errors.length === 0 };
}

Note the deliberately boring details that make it production-safe: keyset pagination by id (not OFFSET, which slows down linearly), ON CONFLICT DO NOTHING for idempotent re-runs, a validation gate that records bad rows instead of crashing, and let successful = 0 so the per-batch counter can actually increment. Ask the agent to wrap each batch in a transaction if you need all-or-nothing semantics per batch.

  • SQL Patterns — query optimization the agent can apply to your slow paths
  • ORM Patterns — generating safe migrations through Drizzle, Prisma, and Knex
  • NoSQL Patterns — document-database migrations and backfills