Workflow Transformation Guide
You installed Cursor, you have a Claude Max seat, your team lead keeps forwarding Codex demos. Everyone agrees AI should make you faster. But your actual workflow has not changed: you still read the ticket, hand-write the code, and the AI is a fancier autocomplete. The wins are real but small, and nobody can point to which step the tool actually replaced.
The problem is not the tool. It is that you bolted AI onto a process designed for humans typing every line. This guide rebuilds one workflow — the feature-delivery loop, from ticket to merged PR — so the AI does the first draft and the heavy mechanical work, and you spend your time on the decisions that need judgment. The same shape applies to bug-fixing, refactoring, and migrations.
What You’ll Walk Away With
Section titled “What You’ll Walk Away With”- A four-stage feature loop (plan to implement to verify to review) that puts AI on the draft and you on the decisions
- The concrete mechanics for the same loop in Cursor, Claude Code, and Codex
- Three copy-paste prompts: a workflow-audit prompt, a ticket-to-plan prompt, and a self-review prompt
- A real GitHub Actions job that runs Claude Code or Codex as a CI reviewer (no hand-waving YAML)
- The failure modes that make AI workflows slower than the old way, and how to catch them
Start With an Audit, Not a Rewrite
Section titled “Start With an Audit, Not a Rewrite”Do not transform everything at once. Pick the one workflow that wastes the most of your week — usually feature delivery or bug triage — and map where the time actually goes before you change anything. The AI is good at this analysis if you point it at your own commit history.
The output tells you which stage to hand to the AI first. For most teams it is the same two: locating relevant code and writing the first-draft change plus its tests. Those are mechanical and verifiable. Architecture and edge-case decisions stay with you.
The Transformed Feature Loop
Section titled “The Transformed Feature Loop”The loop has four stages. The AI drafts; you decide and verify. The mechanics differ per tool, but the stages do not.
-
Plan. Feed the ticket and let the agent produce a written plan — files it will touch, the approach, and open questions. You review and correct the plan before a line of code is written. This is the highest-leverage review point: fixing a plan costs seconds, fixing a wrong implementation costs an hour.
-
Implement. The agent executes the approved plan. You watch the diff, not the keystrokes. Keep changes scoped to one logical unit so the diff stays reviewable.
-
Verify. Tests, types, and lint run on every change. The agent fixes its own failures in a loop until the suite is green. You never merge red.
-
Review. You read the final diff as a senior reviewer: correctness, security, edge cases. A second AI pass (in CI) catches mechanical issues so your human attention goes to judgment calls.
Stage 1-3: Plan, Implement, Verify per tool
Section titled “Stage 1-3: Plan, Implement, Verify per tool”Open Agent mode (the model picker drives this — use Claude Fable 5 for the hardest features where peak intelligence matters, Opus 4.8 for non-trivial work, and Sonnet 4.6 for routine ones; see model comparison for a full breakdown). Paste the ticket and ask for a plan first; review it inline before approving execution.
Cursor’s checkpoints are your safety net for stage 2: every agent action is a restore point, so let it run a multi-file change and roll back instantly if the approach is wrong rather than babysitting each edit. For stage 3, keep your test command in a terminal and tell the agent to run it after each change:
Implement the approved plan. After each file, run `npm test -- --run`and fix any failures before continuing. Do not move on while tests are red.Use the background agent to run a second, independent task (e.g. drafting the migration’s rollback script) while you review the main diff.
Drive the loop from the terminal. For an interactive session, start Claude Code in the repo and ask for a plan; for repeatable or batch work, go headless with -p:
# Stage 1: get a written plan without touching filesclaude -p "Read TICKET-482 in docs/tickets/. Produce an implementation \plan: files to change, approach, and open questions. Do not write code yet." \ --permission-mode plan
# Stage 2-3: execute, with tests gating each stepclaude -p "Implement the approved plan for TICKET-482. Run `npm test` after \each change and fix failures before continuing." \ --allowedTools "Read,Edit,Write,Bash"Enforce stage 3 mechanically with a hook so you cannot forget it. In .claude/settings.json, a PostToolUse hook on Edit/Write can run your test command and feed failures back to Claude automatically:
{ "hooks": { "PostToolUse": [ { "matcher": "Edit|Write", "hooks": [ { "type": "command", "command": "npm test --silent || printf '{\"decision\":\"block\",\"reason\":\"Tests failed after this edit. Fix them before continuing.\"}'" } ] } ] }}The npm test --silent runs on every Edit/Write. If it fails, the hook prints PostToolUse decision-control JSON (decision: "block" with a reason) so the failure is fed back to Claude as a prompt to fix it — never swallowed with || true, which would let a red suite slip through silently.
Codex spans the App, IDE, CLI, and Cloud, all on GPT-5.5 by default (use gpt-5.2-codex for API-key-authenticated runs). For the local loop, run a plan-then-implement pass with explicit approval gating:
# Stage 1: plan only, read-only sandbox so nothing changescodex exec --sandbox read-only \ "Read docs/tickets/TICKET-482.md and output an implementation plan: \ files, approach, open questions. Do not modify files."
# Stage 2-3: implement with workspace-write + approval on risky actionscodex --ask-for-approval on-request --sandbox workspace-write \ "Implement the approved plan for TICKET-482. Run the test suite and fix \ failures before finishing."For larger work, kick the task to a Codex cloud task (or a local worktree) so it runs in an isolated environment and comes back as a reviewable diff — this is how you parallelize: implement feature A in the cloud while you review feature B locally.
Stage 4: Review, including a second AI pass in CI
Section titled “Stage 4: Review, including a second AI pass in CI”Your human review is non-negotiable. But you can offload the mechanical sweep — security smells, missing error handling, obvious edge cases — to an AI reviewer in CI so your attention is reserved for judgment. Both Anthropic and OpenAI ship official, maintained GitHub Actions for this.
anthropics/claude-code-action@v1 runs the full Claude Code runtime inside a runner. This job posts a focused review on every PR:
name: AI PR Reviewon: pull_request: types: [opened, synchronize]
jobs: review: runs-on: ubuntu-latest permissions: contents: read pull-requests: write steps: - uses: actions/checkout@v5 - uses: anthropics/claude-code-action@v1 with: anthropic_api_key: ${{ secrets.ANTHROPIC_API_KEY }} prompt: | Review the diff in PR #${{ github.event.pull_request.number }}. Flag only: security issues, unhandled errors, and missing edge cases. Skip style nits. Post findings as a PR review comment. claude_args: "--model opus"openai/codex-action@v1 installs the Codex CLI and runs codex exec under the sandbox you choose. Use read-only for a review job so it cannot modify the tree:
name: AI PR Reviewon: pull_request: types: [opened, synchronize]
jobs: review: runs-on: ubuntu-latest permissions: contents: read pull-requests: write steps: - uses: actions/checkout@v5 - uses: openai/codex-action@v1 with: openai-api-key: ${{ secrets.OPENAI_API_KEY }} sandbox: read-only prompt: | Review the changes in this pull request. Report only security issues, unhandled errors, and missing edge cases. Skip style.Cursor is IDE-first and has no CI Action of its own — run one of the CLI-based jobs (Claude Code or Codex) above for the automated CI pass. Inside the editor, use Cursor’s built-in review on the agent’s diff before you push: select the changed files and ask the agent “Review this diff for security issues, unhandled errors, and missing edge cases — list them, do not fix yet,” then triage its list as the human reviewer.
Worked Example: Adding a Rate-Limited Endpoint
Section titled “Worked Example: Adding a Rate-Limited Endpoint”Concretely, here is the loop on a real task — adding a rate-limited POST /api/exports endpoint to an Express service.
-
Plan. The agent proposes: new route in
src/routes/exports.ts, arateLimitmiddleware using the existing Redis client, a Zod schema for the body, and three tests (happy path, validation failure, limit exceeded). You notice it planned an in-memory limiter; you correct it to use the shared Redis instance so it works across instances. Cost of that correction: one sentence. -
Implement. It writes the route, middleware, schema, and tests against the corrected plan. You watch the diff land in those four files.
-
Verify.
npm testfails once — the limit-exceeded test expects a 429 but the middleware returns 503. The agent fixes the status code and reruns until green. -
Review. You read the diff: the Redis key includes the user ID and the window, errors return a structured
AppError, no secrets are logged. The CI reviewer flags one thing you missed — the limiter has no fallback if Redis is down. You decide that is acceptable for v1 and note it in the PR. Merge.
The judgment calls (Redis vs in-memory, 503-vs-429 intent, fail-open behavior) stayed with you. The typing did not.