Skip to content

The Human in the Loop: The Developer's Evolving Role

It is 4pm and the AI has just generated a 600-line diff that refactors your Next.js checkout flow. It compiles. The tests pass. It looks plausible. The only question that matters is the one no model can answer for you: do you actually understand what it changed enough to put your name on the commit?

That moment is the whole job now. Adopting AI assistants doesn’t make the developer obsolete — it moves your work up a level, from typing code to directing, constraining, and reviewing it. The teams shipping stable AI-assisted code aren’t the ones who type fastest; they’re the ones who never let the AI close the loop on its own.

  • A concrete plan-execute-review loop that keeps a human decision at each handoff
  • A planning prompt that forces the AI to surface assumptions before it writes code
  • A supervision prompt that constrains execution to one reviewable step
  • A reviewer prompt that turns the AI into a hostile critic of its own diff
  • The per-tool review gates (Cursor checkpoints, Claude Code hooks, Codex PR review) that catch what tired human eyes miss

In a human-in-the-loop workflow you wear three hats in sequence. Each one is a deliberate checkpoint, not a vibe.

1. The Architect (Planning)

Before any code is written, you own the vision and the plan. You give the AI the high-level requirements, have it explore the codebase in read-only mode, and critically evaluate its proposed plan. Your experience is what catches architectural flaws and missed requirements while they are still cheap to fix.

2. The Supervisor (Execution)

During implementation you break the approved plan into small steps and feed them to the AI one at a time. You review each diff before the next step starts. You are the hand on the wheel, not a spectator watching a 600-line commit materialize.

3. The Reviewer (Verification)

You are the final gatekeeper of quality. Treat every AI diff like a pull request from a fast but naive junior: read every line, check edge cases and error paths, and confirm it didn’t guess at business rules. The AI accelerates writing — you remain responsible for correctness.

Here is the loop applied to a real task — adding rate limiting to a Next.js API route — with the prompt that operationalizes each hat.

  1. Architect: get a reviewable plan, not code. In read-only mode (Cursor Ask, Claude Code plan mode, or codex --sandbox read-only), force the AI to plan and flag its assumptions before touching anything.

  2. Supervisor: execute one step, then stop. Approve the plan, switch to execution mode, and constrain the AI to a single step with a hard checkpoint so you review before it continues.

  3. Reviewer: make the AI attack its own diff. Before you accept anything, turn the AI into a skeptical reviewer of its own work, then do your own read on top.

The plan-execute-review loop is the same everywhere, but each tool gives you different mechanical gates to enforce the Reviewer hat. Use them.

Review the AI’s plan in Ask mode before switching to Agent. During execution, Cursor writes a checkpoint before each set of edits — if a step goes wrong you restore to any prior state instead of untangling a half-applied change. Accept or reject each diff per hunk in the review pane rather than bulk-accepting, so nothing off-plan slips in.

The AI is excellent at pattern-matching and generation, but it lacks the judgment the three hats supply:

  • It doesn’t know your business context. It can’t know that a “harmless” change violates a billing rule or breaks a downstream consumer.
  • It makes confident, subtle mistakes. Code that is 99% right with an off-by-one, a race condition, or a missing auth check is more dangerous than code that is obviously broken.
  • It can’t make strategic trade-offs. Performance vs. readability, ship-now vs. build-to-scale — these need a human who owns the consequences.
  • Reviewer fatigue on large diffs. A 600-line AI diff defeats line-by-line review. Fix it upstream: constrain execution to small steps (the Supervisor prompt) so each diff is small enough to actually read.
  • Rubber-stamping. Once the AI is right ten times, you stop reading the eleventh. That’s the one that ships the bug. Keep the hostile self-review prompt in the loop and lean on automated gates (hooks, CI) that don’t get tired.
  • Losing the plan. Long sessions drift from the approved plan. Re-paste the numbered plan and ask “which step are we on, and what changed from the plan?” before continuing.
  • Trusting green tests. AI-written tests can be tautological. Spot-check that tests actually fail when the implementation is broken.