API Development with AI

Your GET /posts resolver loads each post’s author with a separate query. It sailed through review and the demo, because the demo had three posts. In production a feed renders 50 posts, the resolver fires 51 queries, and the database connection pool is on fire by 9am. The AI wrote exactly what you asked for — it just didn’t know your access pattern, because the prompt never told it.

AI is genuinely fast at API work: spec-to-code, validation, error middleware, contract tests. But “fast” turns into “on fire in prod” when you let it improvise the shape of the system. The reliable workflow is spec-first: pin the contract (OpenAPI, GraphQL schema, or proto), make the AI generate against it, and let your tests — not the demo — decide when it’s done.

What You’ll Walk Away With

A spec-first loop where the contract drives generation, so the implementation can’t silently drift
Copy-paste prompts that name the stack (Express + TypeScript + Zod + Vitest, Pact for contracts, k6 for load) instead of leaving [placeholder] brackets
The Cursor / Claude Code / Codex variant for spec-to-code, CI contract runs, and SDK regeneration
The failure modes that bite AI-generated APIs: spec drift, N+1 resolvers, missing pagination cursors, auth middleware ordering

The Workflow

Pin the contract. Generate the OpenAPI spec, GraphQL schema, or proto file first and review it as a human. This is the artifact everything else is checked against.
Generate against the contract. Point the agent at the spec file and ask it to implement endpoints/resolvers with validation and error handling — not to invent the API as it goes.
Lock behavior with tests. Generate unit, integration, and contract tests. Make the response shapes in the tests match the handlers exactly, then run them in CI.
Regenerate clients. Re-run the SDK generator from the (now authoritative) spec so consumers stay in lockstep with the server.

Design the contract first

Whatever the protocol, get the AI to produce the contract before any implementation. Be specific about the maturity and conventions you want.

The AI returns a spec you can review and version. A trimmed slice of what to expect:

paths:
  /tasks:
    get:
      summary: List tasks
      parameters:
        - { name: status, in: query, schema: { type: string, enum: [todo, in_progress, done] } }
        - { name: limit, in: query, schema: { type: integer, default: 20, maximum: 100 } }
        - { name: cursor, in: query, schema: { type: string } }
      responses:
        '200':
          description: Paginated task list
          content:
            application/json:
              schema: { $ref: '#/components/schemas/TaskList' }

For GraphQL, ask for the schema with the connection types and subscriptions spelled out; for gRPC, ask for the .proto with streaming RPCs and field masks. The discipline is the same: contract first, review, then implement.

Generate the implementation against the spec

Now point the agent at the spec and name the stack. The response shape it returns must match what your tests will assert — drift here is the number-one source of “passes locally, 500s in CI.”

A representative slice of what the agent produces:

import { Router } from 'express';
import { z } from 'zod';
import { requireAuth } from '../middleware/auth';

const listQuery = z.object({
  status: z.enum(['todo', 'in_progress', 'done']).optional(),
  limit: z.coerce.number().int().min(1).max(100).default(20),
  cursor: z.string().optional(),
});

const router = Router();

router.get('/tasks', requireAuth, async (req, res, next) => {
  const parsed = listQuery.safeParse(req.query);
  if (!parsed.success) {
    return res.status(400).json({
      type: 'about:blank',
      title: 'Invalid query parameters',
      status: 400,
      errors: parsed.error.issues,
    });
  }
  try {
    const { data, nextCursor } = await taskService.list({
      ...parsed.data,
      userId: req.user.id,
    });
    res.json({ data, nextCursor }); // shape matches the spec and the tests
  } catch (err) {
    next(err);
  }
});

Schema-aware generation with an MCP server

For database-backed endpoints, the single biggest quality jump comes from giving the agent your real schema instead of making it guess. A Postgres MCP server turns “generate a tasks endpoint” from blind scaffolding into schema-accurate code with the right column names, types, and indexes.

Without it: the AI invents taskService.list() and you spend a round correcting field names against your actual tables.

With it: the agent reads the live schema, generates queries that match it, and flags the missing index behind your status filter. For TypeScript teams, the Prisma Postgres MCP is built into the Prisma CLI and also manages migrations:

# Claude Code — register the Prisma Postgres MCP (schema + migrations)
claude mcp add prisma -- npx prisma mcp

The same server registers in Cursor (Settings -> MCP) and Codex (~/.codex/config.toml) — MCP setup is identical across all three tools. If you only need lightweight, single-purpose augmentation — say, linting the OpenAPI spec rather than a persistent DB connection — an Agent Skill is the lighter fit: install one from skills.sh with npx skills add <owner/repo> (the universal CLI from vercel-labs/skills), which works across Claude Code, Cursor, and Codex.

Auth, validation, and error handling

Generate the cross-cutting middleware once, and be explicit that ordering matters.

Lock the contract with tests

The point of tests here is to freeze the response shape and the error contract so a later AI edit can’t quietly change them.

Use Agent mode to generate the integration suite, then run it inline. In Settings -> Cursor Settings -> Agents -> Auto-Run, allowlist npx vitest so the suite runs without prompting, and watch the diff: reject any change where the test’s asserted body diverges from the handler’s actual response. Cursor’s multi-file edit is the sweet spot for “regenerate the handler and its test together so the shapes stay in sync.”

Run the contract suite headlessly so it doubles as a CI gate. In a GitHub Actions step:

claude -p "Run the Pact consumer tests with 'npx vitest run tests/contract'. If any contract fails, summarize which field broke and why." --allowedTools "Read,Bash"

For local work, a PostToolUse hook (matcher Edit|Write) that re-runs npx vitest run after every edit gives Claude the failing assertion immediately, so a shape mismatch surfaces in the same turn it’s introduced.

Generate and iterate on the suite in the TUI with the workspace sandbox and interactive approval policy configured explicitly. Routine edits and tests inside the sandbox can proceed without a prompt:

codex --sandbox workspace-write -c approval_policy=on-request \
  "Generate Pact contract tests for the tasks service in tests/contract, then run 'npx vitest run tests/contract' and fix any failures. Don't change the response shapes — fix the implementation."

For provider verification against a running service, a Codex Cloud automation can run the verification on each push and report back, keeping consumer and provider in lockstep.

The generated integration test must mirror the handler’s { data, nextCursor } shape:

describe('GET /tasks', () => {
  it('returns a cursor-paginated list', async () => {
    const token = await getAuthToken();
    const res = await request(app)
      .get('/tasks?status=todo&limit=10')
      .set('Authorization', `Bearer ${token}`)
      .expect(200);

    expect(res.body.data).toBeInstanceOf(Array);
    expect(['string', 'object']).toContain(typeof res.body.nextCursor); // string or null
  });
});

For load, generate a k6 script with explicit thresholds so a regression fails the run rather than just looking slow:

import http from 'k6/http';
import { check } from 'k6';

export const options = {
  stages: [
    { duration: '2m', target: 100 },
    { duration: '5m', target: 100 },
    { duration: '2m', target: 0 },
  ],
  thresholds: {
    http_req_duration: ['p(95)<500'],
    http_req_failed: ['rate<0.01'],
  },
};

export default function () {
  const res = http.get(`${__ENV.BASE_URL}/tasks`, {
    headers: { Authorization: `Bearer ${__ENV.TOKEN}` },
  });
  check(res, { 'status is 200': (r) => r.status === 200 });
}

Versioning and deprecation

When you cut a v2, generate the version middleware and emit a deprecation signal with a clearly future sunset date.

app.use('/api/v1', v1Routes);
app.use('/api/v2', v2Routes);

const deprecateV1 = (_req, res, next) => {
  // RFC 9745: Deprecation is an sf-date (RFC 9651) — an @-prefixed Unix timestamp, not "true".
  res.setHeader('Deprecation', '@1780617600'); // 2026-06-05, the date v1 was deprecated
  res.setHeader('Sunset', 'Wed, 31 Dec 2026 23:59:59 GMT'); // RFC 8594: HTTP-date
  res.setHeader('Link', '<https://docs.example.com/migration-v2>; rel="deprecation"');
  next();
};

After any spec change, regenerate the clients so consumers move with you:

npx @openapitools/openapi-generator-cli generate -i openapi.yaml -g typescript-axios -o ./sdk/typescript

When This Breaks

The spec and the implementation drift apart. The AI updates a handler but not openapi.yaml (or vice versa), and the generated SDK no longer matches the server. Treat the spec as source of truth: regenerate from it, and add a CI step that diffs the live routes against the spec (or runs Schemathesis against the spec) so drift fails the build.

N+1 in resolvers and ORMs. Generated GraphQL resolvers and naive ORM calls love to query-per-row. Prompt explicitly for DataLoader (GraphQL) or a single batched query with a join, and load-test the list endpoints — the demo’s three rows will never reveal it.

Missing or broken pagination. AI frequently returns the first page and calls it done, or emits an offset where you asked for a cursor. Assert nextCursor round-trips in an integration test: fetch page one, feed nextCursor back, confirm you get distinct rows.

Auth middleware in the wrong order. If validation or the route runs before auth, unauthenticated requests reach your handlers. Pin the order in the prompt (auth first, error handler last) and add a test that an unauthenticated request gets 401, not 400.

Validation gaps. Generated handlers often validate the happy path and trust everything else. Make Zod (or equivalent) the gate on every input and test the rejection paths, not just acceptance.

What’s Next

API Testing with AI Go deeper on contract, integration, and fuzz testing for the endpoints you just built.

Microservices with AI Take the contract-first approach across service boundaries and inter-service calls.

Database Development with AI Schema design and migrations behind the endpoints — where the schema-aware MCP earns its keep.