You are watching the AI rewrite your authentication module. It looks confident. The code is flowing. Fifteen files modified, 800 lines changed. You accepted everything because it “looked right.” Two weeks later, you discover the AI silently removed a CSRF protection check that was in the original code. Nobody caught it because nobody was really reviewing.
The opposite extreme is equally wasteful: you approve every single file write, review every line as it is generated, and spend more time supervising the AI than it would take to write the code yourself.
Effective human-AI collaboration is neither blind trust nor constant supervision. It is a deliberate set of patterns for when to engage deeply, when to let the AI work autonomously, and how to review efficiently.
Your role shifts throughout a development session. Recognizing which mode you should be in saves time and catches problems early.
Architect Mode
You are defining the approach. This happens during planning, when you are writing requirements, reviewing proposals, and making architectural decisions. The AI proposes, you decide. Spend the most time here — mistakes in architecture are the most expensive to fix.
Supervisor Mode
The AI is implementing while you monitor. You are watching the stream of changes, scanning for red flags, and ready to hit Escape if something goes wrong. You do not need to understand every line in real-time, but you should notice when the AI is modifying files it should not touch or taking an approach you did not agree on.
Reviewer Mode
The AI has finished a task. You review the complete change, run tests, and decide whether to commit. This is where you catch subtle issues: security vulnerabilities, performance problems, missing edge cases, and violations of your team’s conventions.
Escape stops the agent mid-action while preserving context
Checkpoints let you rewind to any previous state
Review mode shows all pending changes before they are applied
Background Agent runs tasks asynchronously while you work on something else, with review before merge
Before you start implementing, list all the files you plan to modify.
I want to verify the scope before you begin.
Use Cursor’s diff view to review changes file by file after the AI finishes. Accept or reject individual file changes rather than accepting everything at once.
Claude Code provides fine-grained control:
Escape stops Claude mid-action
Escape + Escape opens the rewind menu to restore previous state
Plan Mode (Shift+Tab) prevents all file modifications
Permission system requires approval for file writes and commands
/compact summarizes context when the session gets long
Implement the rate limiter from the plan. After each file you modify,
stop and tell me what you changed and why. Wait for my approval
before moving to the next file.
For maximum autonomy with safety, use Claude Code’s sandbox mode (/sandbox). Claude can work freely within filesystem and network boundaries you define.
Codex offers review at multiple levels:
App review shows complete diffs before merging to your branch
IDE inline review lets you accept or reject individual changes
Approval modes control how much autonomy Codex has (on-request, on-failure, never)
Cloud threads run in isolated environments, so nothing touches your local code until you review
Implement the rate limiter. Show me the full diff when done.
Do not push or merge until I review.
Codex’s worktree-based approach means the AI works on a separate copy of your code. You review and merge the changes explicitly, similar to reviewing a PR.
Reviewing AI-generated code is different from reviewing human code. The AI does not get tired, cut corners, or have bad days — but it does have systematic blind spots. Focus your review on these areas:
Scope check. Did the AI only modify the files it should have? Run git diff --stat to see the full picture before diving into individual files.
Deleted code. Any code the AI removed should be justified. Deletions are the highest-risk changes because they are easy to miss in a diff.
Security. Look for hardcoded secrets, missing input validation, weakened authentication, and unescaped user input. AI models are trained on code that often lacks security best practices.
Error handling. Check that errors are handled, not silently swallowed. Look for empty catch blocks, missing null checks, and ignored return values.
Test coverage. Are the new tests meaningful, or did the AI write tests that pass trivially? Check that assertions are specific and that edge cases are covered.
Stop. The approach you're taking with the notification service
won't work because it doesn't account for our message queue.
Let me redirect: instead of direct database writes, use the
existing event bus in @src/services/eventBus.ts. Read that file
first, then revise your approach.
If Cursor has gone too far down the wrong path, use checkpoints to rewind to the last good state rather than trying to undo changes manually.
Undo that last change. The notification service should use the
event bus pattern, not direct database writes. Read
src/services/eventBus.ts for the pattern we use.
If the session context is cluttered with failed approaches, use /clear and start fresh with a better prompt. A clean context with a good prompt always outperforms a polluted context with corrections.
In the Codex App, you can add follow-up prompts to redirect:
Change approach: use the event bus pattern from
src/services/eventBus.ts instead of direct database writes.
Revert the notification service changes and start over with
the event bus approach.
With cloud threads, you can abandon a thread entirely and start a new one with a revised prompt. The original thread’s work is discarded cleanly.
You become the bottleneck. If you are spending more time approving individual file writes than the AI spends generating them, you are over-supervising. Batch your review — let the AI complete the task, then review the full diff.
You trust too much. If you find bugs in production that came from AI-generated code, tighten your review process. Add mandatory test coverage thresholds. Use the writer/reviewer pattern with two separate sessions.
The AI keeps going off-plan. If course corrections are not sticking, the issue is likely prompt quality or context overload. Reference the plan file explicitly in every prompt. Keep sessions focused on single tasks.
Review fatigue. Reviewing 500 lines of AI-generated code is exhausting. Break large changes into multiple commits, each reviewed independently. Use the AI self-review prompt to pre-filter issues before you look at the code.