Workflow Transformation Guide

You installed Cursor, you have a Claude Max seat, your team lead keeps forwarding Codex demos. Everyone agrees AI should make you faster. But your actual workflow has not changed: you still read the ticket, hand-write the code, and the AI is a fancier autocomplete. The wins are real but small, and nobody can point to which step the tool actually replaced.

The problem is not the tool. It is that you bolted AI onto a process designed for humans typing every line. This guide rebuilds one workflow — the feature-delivery loop, from ticket to merged PR — so the AI does the first draft and the heavy mechanical work, and you spend your time on the decisions that need judgment. The same shape applies to bug-fixing, refactoring, and migrations.

What You’ll Walk Away With

A four-stage feature loop (plan to implement to verify to review) that puts AI on the draft and you on the decisions
The concrete mechanics for the same loop in Cursor, Claude Code, and Codex
Three copy-paste prompts: a workflow-audit prompt, a ticket-to-plan prompt, and a self-review prompt
A real GitHub Actions job that runs Claude Code or Codex as a CI reviewer (no hand-waving YAML)
The failure modes that make AI workflows slower than the old way, and how to catch them

Start With an Audit, Not a Rewrite

Do not transform everything at once. Pick the one workflow that wastes the most of your week — usually feature delivery or bug triage — and map where the time actually goes before you change anything. The AI is good at this analysis if you point it at your own commit history.

The output tells you which stage to hand to the AI first. For most teams it is the same two: locating relevant code and writing the first-draft change plus its tests. Those are mechanical and verifiable. Architecture and edge-case decisions stay with you.

The Transformed Feature Loop

The loop has four stages. The AI drafts; you decide and verify. The mechanics differ per tool, but the stages do not.

Plan. Feed the ticket and let the agent produce a written plan — files it will touch, the approach, and open questions. You review and correct the plan before a line of code is written. This is the highest-leverage review point: fixing a plan costs seconds, fixing a wrong implementation costs an hour.
Implement. The agent executes the approved plan. You watch the diff, not the keystrokes. Keep changes scoped to one logical unit so the diff stays reviewable.
Verify. Tests, types, and lint run on every change. The agent fixes its own failures in a loop until the suite is green. You never merge red.
Review. You read the final diff as a senior reviewer: correctness, security, edge cases. A second AI pass (in CI) catches mechanical issues so your human attention goes to judgment calls.

Stage 1-3: Plan, Implement, Verify per tool

Open Agent mode (the model picker drives this — use Claude Fable 5 for the hardest features where peak intelligence matters, Opus 5 for non-trivial work, and Sonnet 5 for routine ones; see model comparison for a full breakdown). Paste the ticket and ask for a plan first; review it inline before approving execution.

Cursor’s checkpoints are your safety net for stage 2: every agent action is a restore point, so let it run a multi-file change and roll back instantly if the approach is wrong rather than babysitting each edit. For stage 3, keep your test command in a terminal and tell the agent to run it after each change:

Implement the approved plan. After each file, run `npm test -- --run`
and fix any failures before continuing. Do not move on while tests are red.

Use the background agent to run a second, independent task (e.g. drafting the migration’s rollback script) while you review the main diff.

Drive the loop from the terminal. For an interactive session, start Claude Code in the repo and ask for a plan; for repeatable or batch work, go headless with -p:

# Stage 1: get a written plan without touching files
claude -p "Read TICKET-482 in docs/tickets/. Produce an implementation \
plan: files to change, approach, and open questions. Do not write code yet." \
  --permission-mode plan

# Stage 2-3: execute, with tests gating each step
claude -p "Implement the approved plan for TICKET-482. Run `npm test` after \
each change and fix failures before continuing." \
  --allowedTools "Read,Edit,Write,Bash"

Enforce stage 3 mechanically with a hook so you cannot forget it. In .claude/settings.json, a PostToolUse hook on Edit/Write can run your test command and feed failures back to Claude automatically:

{
  "hooks": {
    "PostToolUse": [
      {
        "matcher": "Edit|Write",
        "hooks": [
          {
            "type": "command",
            "command": "npm test --silent || printf '{\"decision\":\"block\",\"reason\":\"Tests failed after this edit. Fix them before continuing.\"}'"
          }
        ]
      }
    ]
  }
}

The npm test --silent runs on every Edit/Write. If it fails, the hook prints PostToolUse decision-control JSON (decision: "block" with a reason) so the failure is fed back to Claude as a prompt to fix it — never swallowed with || true, which would let a red suite slip through silently.

Codex spans ChatGPT desktop, IDE, CLI, and Cloud. GPT-5.6 access is plan-dependent: Free/Go use Terra, while Plus and higher can choose Sol, Terra, or Luna and set effort. For the local loop, run a plan-then-implement pass with explicit approval gating:

# Stage 1: plan only, read-only sandbox so nothing changes
codex exec --sandbox read-only \
  "Read docs/tickets/TICKET-482.md and output an implementation plan: \
   files, approach, open questions. Do not modify files."

# Stage 2-3: implement with workspace-write + approval on risky actions
codex --sandbox workspace-write -c approval_policy=on-request \
  "Implement the approved plan for TICKET-482. Run the test suite and fix \
   failures before finishing."

For larger work, kick the task to a Codex cloud task (or a local worktree) so it runs in an isolated environment and comes back as a reviewable diff — this is how you parallelize: implement feature A in the cloud while you review feature B locally.

Stage 4: Review, including a second AI pass in CI

Your human review is non-negotiable. But you can offload the mechanical sweep — security smells, missing error handling, obvious edge cases — to an AI reviewer in CI so your attention is reserved for judgment. Both Anthropic and OpenAI ship official, maintained GitHub Actions for this.

anthropics/claude-code-action@v1 runs the full Claude Code runtime inside a runner. This job posts a focused review on every PR:

name: AI PR Review
on:
  pull_request:
    types: [opened, synchronize]

jobs:
  review:
    runs-on: ubuntu-latest
    permissions:
      contents: read
      pull-requests: write
    steps:
      - uses: actions/checkout@v5
      - uses: anthropics/claude-code-action@v1
        with:
          anthropic_api_key: ${{ secrets.ANTHROPIC_API_KEY }}
          prompt: |
            Review the diff in PR #${{ github.event.pull_request.number }}.
            Flag only: security issues, unhandled errors, and missing edge cases.
            Skip style nits. Post findings as a PR review comment.
          claude_args: "--model opus"

openai/codex-action@v1 installs the Codex CLI and runs codex exec under the sandbox you choose. Use read-only for a review job so it cannot modify the tree:

name: AI PR Review
on:
  pull_request:
    types: [opened, synchronize]

jobs:
  review:
    runs-on: ubuntu-latest
    permissions:
      contents: read
      pull-requests: write
    steps:
      - uses: actions/checkout@v5
      - uses: openai/codex-action@v1
        with:
          openai-api-key: ${{ secrets.OPENAI_API_KEY }}
          sandbox: read-only
          prompt: |
            Review the changes in this pull request. Report only security
            issues, unhandled errors, and missing edge cases. Skip style.

Worked Example: Adding a Rate-Limited Endpoint

Concretely, here is the loop on a real task — adding a rate-limited POST /api/exports endpoint to an Express service.

Plan. The agent proposes: new route in src/routes/exports.ts, a rateLimit middleware using the existing Redis client, a Zod schema for the body, and three tests (happy path, validation failure, limit exceeded). You notice it planned an in-memory limiter; you correct it to use the shared Redis instance so it works across instances. Cost of that correction: one sentence.
Implement. It writes the route, middleware, schema, and tests against the corrected plan. You watch the diff land in those four files.
Verify. npm test fails once — the limit-exceeded test expects a 429 but the middleware returns 503. The agent fixes the status code and reruns until green.
Review. You read the diff: the Redis key includes the user ID and the window, errors return a structured AppError, no secrets are logged. The CI reviewer flags one thing you missed — the limiter has no fallback if Redis is down. You decide that is acceptable for v1 and note it in the PR. Merge.

The judgment calls (Redis vs in-memory, 503-vs-429 intent, fail-open behavior) stayed with you. The typing did not.

When This Breaks

The AI confidently implements the wrong approach. Almost always because you skipped the plan-review gate. Reinstate it: no code until the written plan is approved. If the agent keeps drifting mid-implementation, your plan was too vague — name the files and the approach explicitly.

Tests pass but the code is subtly wrong. The agent wrote tests that assert its own (incorrect) behavior. Tests it generates are a starting point, not a safety net you can trust blindly. Spot-check that the test actually encodes the requirement, especially for the edge cases you care about.

The loop is slower than just writing it yourself. True for small, well-understood changes — a one-line config tweak does not need a four-stage loop. Reserve the full loop for changes that touch multiple files or where you would have spent real time locating code. Match the ceremony to the size of the task.

Context runs out on a large change. The agent forgets earlier decisions and contradicts itself. Scope each loop to one logical unit and start a fresh session per unit. For genuinely large work, break it into independent tasks using separate git worktrees (optional managed worktrees in ChatGPT desktop), Codex Cloud tasks, or Cursor’s background agent rather than one giant session.

Every developer gets different output. The prompts and conventions live in people’s heads, so the AI behaves inconsistently. Commit your conventions as a project rule or skill (a CLAUDE.md, a Cursor rule, or a shared skill) so every session — and every teammate — starts from the same instructions.

What’s Next

The PRD to Plan to Todo Methodology Go deeper on the plan-first gate: turning a PRD into a reviewable plan and a tracked todo list across all three tools.

Team Migration Strategies Roll this loop out across a team: standards, shared skills, and how to onboard without chaos.

Project Conversion Playbook Bring an existing codebase under AI-assisted workflows: config, conventions, and the first safe changes.

Migrating from Traditional IDEs The before-and-after for developers coming from a non-AI editor, with the habits to unlearn.