The Human in the Loop: The Developer's Evolving Role

It is 4pm and the AI has just generated a 600-line diff that refactors your Next.js checkout flow. It compiles. The tests pass. It looks plausible. The only question that matters is the one no model can answer for you: do you actually understand what it changed enough to put your name on the commit?

That moment is the whole job now. Adopting AI assistants doesn’t make the developer obsolete — it moves your work up a level, from typing code to directing, constraining, and reviewing it. The teams shipping stable AI-assisted code aren’t the ones who type fastest; they’re the ones who never let the AI close the loop on its own.

What You’ll Walk Away With

A concrete plan-execute-review loop that keeps a human decision at each handoff
A planning prompt that forces the AI to surface assumptions before it writes code
A supervision prompt that constrains execution to one reviewable step
A reviewer prompt that turns the AI into a hostile critic of its own diff
The per-tool review gates (Cursor checkpoints, Claude Code hooks, Codex PR review) that catch what tired human eyes miss

The Three Hats

In a human-in-the-loop workflow you wear three hats in sequence. Each one is a deliberate checkpoint, not a vibe.

1. The Architect (Planning)

Before any code is written, you own the vision and the plan. You give the AI the high-level requirements, have it explore the codebase in read-only mode, and critically evaluate its proposed plan. Your experience is what catches architectural flaws and missed requirements while they are still cheap to fix.

2. The Supervisor (Execution)

During implementation you break the approved plan into small steps and feed them to the AI one at a time. You review each diff before the next step starts. You are the hand on the wheel, not a spectator watching a 600-line commit materialize.

3. The Reviewer (Verification)

You are the final gatekeeper of quality. Treat every AI diff like a pull request from a fast but naive junior: read every line, check edge cases and error paths, and confirm it didn’t guess at business rules. The AI accelerates writing — you remain responsible for correctness.

The Workflow: Plan, Execute, Review

Here is the loop applied to a real task — adding rate limiting to a Next.js API route — with the prompt that operationalizes each hat.

Architect: get a reviewable plan, not code. In read-only mode (Cursor Ask, Claude Code plan mode, or codex --sandbox read-only), force the AI to plan and flag its assumptions before touching anything.

Planning prompt that surfaces assumptions:
In read-only/plan mode: propose a plan to add per-user rate limiting to app/api/checkout/route.ts. List the files you’d change, the library you’d use and why, and where the limit state would live. Before the plan, write an “Assumptions” section listing every business rule you are guessing at (limit value, window, what happens on limit, identity source) and mark each as CONFIRMED or GUESS. Do not write code yet.
Supervisor: execute one step, then stop. Approve the plan, switch to execution mode, and constrain the AI to a single step with a hard checkpoint so you review before it continues.

Supervision prompt that constrains scope:
Execute step 1 only: add the rateLimit helper in lib/rate-limit.ts using a sliding-window counter in our existing Redis client. Do not wire it into the route yet. Show me the new file and run npm test -- rate-limit. Stop and wait for my approval before step 2.
Reviewer: make the AI attack its own diff. Before you accept anything, turn the AI into a skeptical reviewer of its own work, then do your own read on top.

Reviewer prompt that finds what you’d miss:
Review your own diff as a hostile senior reviewer: list every edge case, error path, and security concern you did NOT handle (race conditions on the counter, Redis unavailable, clock skew, IP spoofing if identity is IP-based), and call out every place you guessed at a business rule instead of reading it from the code. Rank the findings by severity.

Per-Tool Review Gates

The plan-execute-review loop is the same everywhere, but each tool gives you different mechanical gates to enforce the Reviewer hat. Use them.

Review the AI’s plan in Ask mode before switching to Agent. During execution, Cursor writes a checkpoint before each set of edits — if a step goes wrong you restore to any prior state instead of untangling a half-applied change. Accept or reject each diff per hunk in the review pane rather than bulk-accepting, so nothing off-plan slips in.

Use plan mode (Shift+Tab or claude --permission-mode plan) for the Architect phase, then review diffs directly in the terminal during execution. Turn review into an automated gate with hooks: a PreToolUse hook can block edits to protected paths (migrations, .env, CI config), and a PostToolUse hook can auto-run your linter or test suite after each edit so a failing change can’t quietly accumulate.

Why Your Expertise Matters More Than Ever

The AI is excellent at pattern-matching and generation, but it lacks the judgment the three hats supply:

It doesn’t know your business context. It can’t know that a “harmless” change violates a billing rule or breaks a downstream consumer.
It makes confident, subtle mistakes. Code that is 99% right with an off-by-one, a race condition, or a missing auth check is more dangerous than code that is obviously broken.
It can’t make strategic trade-offs. Performance vs. readability, ship-now vs. build-to-scale — these need a human who owns the consequences.

When This Breaks

Reviewer fatigue on large diffs. A 600-line AI diff defeats line-by-line review. Fix it upstream: constrain execution to small steps (the Supervisor prompt) so each diff is small enough to actually read.
Rubber-stamping. Once the AI is right ten times, you stop reading the eleventh. That’s the one that ships the bug. Keep the hostile self-review prompt in the loop and lean on automated gates (hooks, CI) that don’t get tired.
Losing the plan. Long sessions drift from the approved plan. Re-paste the numbered plan and ask “which step are we on, and what changed from the plan?” before continuing.
Trusting green tests. AI-written tests can be tautological. Spot-check that tests actually fail when the implementation is broken.

What’s Next

Choosing the Right Mode: Agent vs. Ask — the read-only-then-execute discipline that powers the Architect and Supervisor hats
Grill Me & Grill With Docs — make the Architect hat a relentless interview: the agent grills you until the plan’s assumptions are resolved
Best Practices for AI-Assisted Development — the full set of workflows expert AI-powered developers rely on