AI Code Review Checklist

AI-generated code should pass the same review standard as human-written code. The difference is that AI tends to fail in predictable places: missing context, accidental scope creep, weak tests, unsafe assumptions, and changes that look plausible but do not fit the system. Use this checklist before you ask a teammate to review an AI-assisted pull request.

How to Use This Checklist

Run the checklist in three passes:

Scope pass: confirm the change only does what the issue asked for.
Correctness pass: verify behavior, tests, data handling, and failure modes.
Production pass: check security, operations, performance, and reviewability.

For small changes, this takes five minutes. For auth, billing, migrations, permissions, or customer data, do the full pass every time.

The 50+ Point Checklist

1. Scope and Intent

The PR has a clear problem statement and success criteria.
The diff only touches files needed for the requested change.
The agent did not mix feature work with unrelated refactors.
Generated names, copy, comments, and examples match the product domain.
The implementation follows existing local patterns instead of inventing a new architecture.
New abstractions remove real duplication or complexity, not just make the code look cleaner.
The PR description explains what was intentionally not changed.
Any assumptions are written down and easy for a reviewer to challenge.

2. Context Fit

The agent read the relevant files before editing.
The change respects existing module boundaries and ownership.
Public interfaces remain backward-compatible unless the PR explicitly says otherwise.
Naming matches surrounding code conventions.
Error handling style matches the rest of the codebase.
Logging, telemetry, and analytics use existing helpers.
The change does not duplicate an existing utility, type, hook, component, or service.

3. Correctness

Happy path behavior is covered by a test or a reproducible manual check.
Important edge cases are covered: empty input, null/undefined, limits, duplicates, timeouts.
The implementation handles partial failure instead of assuming every dependency succeeds.
Async work is awaited or intentionally fire-and-forget with lifecycle handling.
State transitions are explicit and cannot silently skip required steps.
Date, currency, locale, and timezone logic is deterministic.
The code handles retries, idempotency, or duplicate events where relevant.
The output shape matches existing API contracts.
The change has been tested against realistic data, not only toy examples.

4. Tests

Tests describe behavior, not implementation details.
Each new test would fail if the feature were broken.
The test suite covers at least one failure path.
Mocks sit at system boundaries, not inside the logic being tested.
Snapshot updates are intentional and reviewed.
Flaky waits, sleeps, or network dependencies were avoided.
The relevant local check command was run and recorded in the PR.

5. Security and Privacy

New inputs are validated at the boundary.
Authorization is checked separately from authentication.
User-controlled values are not interpolated into SQL, shell commands, paths, HTML, or URLs unsafely.
Secrets are not logged, returned, committed, or exposed to the client bundle.
The change does not broaden permissions, OAuth scopes, CORS, CSP, or webhook trust without explanation.
Sensitive data is redacted in logs, analytics, and error messages.
File uploads, redirects, and callbacks are constrained to expected origins and types.

6. Data and Migrations

Schema changes are backward-compatible during deploy.
Migrations are idempotent or have a clear rollback strategy.
Existing data is preserved and transformed intentionally.
New queries use indexes or bounded scans where volume matters.
Background jobs and webhooks can tolerate duplicate delivery.
Deletes are soft, recoverable, or explicitly justified.

7. Performance and Operations

The change does not add an unbounded loop, N+1 query, or repeated network call.
Expensive work is cached, batched, queued, or paginated where needed.
Client-side bundles do not grow because of avoidable imports.
Errors include enough context to debug without leaking private data.
Monitoring, alerting, or analytics exists for new critical paths.
The feature fails closed for security-sensitive paths and fails gracefully for UX paths.

8. Reviewability

The diff is small enough to review in one sitting.
Generated code was simplified before review.
Comments explain non-obvious decisions, not what the code plainly does.
Formatting-only churn is isolated or removed.
The PR includes screenshots or traces when behavior is visual or operational.
A human can identify the riskiest part of the change within two minutes.

Copy-Paste Prompt

This prompt is self-contained — it carries the full checklist verbatim, so you can paste it straight into Cursor, Claude Code, or Codex before opening a PR:

Review this AI-assisted change before I open a pull request.

Score the diff against the checklist below. Prioritize correctness, security,
data safety, backward compatibility, tests, and reviewability. Do not comment
on style unless it hides a real bug or makes the diff harder to review.

For each finding, report:
- severity: P0, P1, P2, or P3
- exact file and line if available
- why it matters in production
- the smallest safe fix

Checklist:

Scope and Intent
- The PR has a clear problem statement and success criteria.
- The diff only touches files needed for the requested change.
- The agent did not mix feature work with unrelated refactors.
- Generated names, copy, comments, and examples match the product domain.
- The implementation follows existing local patterns instead of inventing a new architecture.
- New abstractions remove real duplication or complexity, not just make the code look cleaner.
- The PR description explains what was intentionally not changed.
- Any assumptions are written down and easy for a reviewer to challenge.

Context Fit
- The agent read the relevant files before editing.
- The change respects existing module boundaries and ownership.
- Public interfaces remain backward-compatible unless the PR explicitly says otherwise.
- Naming matches surrounding code conventions.
- Error handling style matches the rest of the codebase.
- Logging, telemetry, and analytics use existing helpers.
- The change does not duplicate an existing utility, type, hook, component, or service.

Correctness
- Happy path behavior is covered by a test or a reproducible manual check.
- Important edge cases are covered: empty input, null/undefined, limits, duplicates, timeouts.
- The implementation handles partial failure instead of assuming every dependency succeeds.
- Async work is awaited or intentionally fire-and-forget with lifecycle handling.
- State transitions are explicit and cannot silently skip required steps.
- Date, currency, locale, and timezone logic is deterministic.
- The code handles retries, idempotency, or duplicate events where relevant.
- The output shape matches existing API contracts.
- The change has been tested against realistic data, not only toy examples.

Tests
- Tests describe behavior, not implementation details.
- Each new test would fail if the feature were broken.
- The test suite covers at least one failure path.
- Mocks sit at system boundaries, not inside the logic being tested.
- Snapshot updates are intentional and reviewed.
- Flaky waits, sleeps, or network dependencies were avoided.
- The relevant local check command was run and recorded in the PR.

Security and Privacy
- New inputs are validated at the boundary.
- Authorization is checked separately from authentication.
- User-controlled values are not interpolated into SQL, shell commands, paths, HTML, or URLs unsafely.
- Secrets are not logged, returned, committed, or exposed to the client bundle.
- The change does not broaden permissions, OAuth scopes, CORS, CSP, or webhook trust without explanation.
- Sensitive data is redacted in logs, analytics, and error messages.
- File uploads, redirects, and callbacks are constrained to expected origins and types.

Data and Migrations
- Schema changes are backward-compatible during deploy.
- Migrations are idempotent or have a clear rollback strategy.
- Existing data is preserved and transformed intentionally.
- New queries use indexes or bounded scans where volume matters.
- Background jobs and webhooks can tolerate duplicate delivery.
- Deletes are soft, recoverable, or explicitly justified.

Performance and Operations
- The change does not add an unbounded loop, N+1 query, or repeated network call.
- Expensive work is cached, batched, queued, or paginated where needed.
- Client-side bundles do not grow because of avoidable imports.
- Errors include enough context to debug without leaking private data.
- Monitoring, alerting, or analytics exists for new critical paths.
- The feature fails closed for security-sensitive paths and fails gracefully for UX paths.

Reviewability
- The diff is small enough to review in one sitting.
- Generated code was simplified before review.
- Comments explain non-obvious decisions, not what the code plainly does.
- Formatting-only churn is isolated or removed.
- The PR includes screenshots or traces when behavior is visual or operational.
- A human can identify the riskiest part of the change within two minutes.

Tool-Specific Review Prompts

The prompt above is one-shot. To make the checklist a standing review standard, save it once as your tool’s rules file, then trigger a review whenever you need one — now the short prompt actually has the 50+ points to work from.

Cursor

Save the checklist to .cursor/rules/code-review.mdc, then in chat:

Review the current diff against the code-review rule. Flag accidental scope
creep, missing tests, unsafe assumptions, and changes that do not fit the
surrounding code. Suggest the smallest safe patch for each issue.

Claude Code

Paste the checklist into CLAUDE.md (or a /review command), then:

claude "Review my uncommitted changes against the code-review checklist in
CLAUDE.md. Run the relevant test/typecheck commands if they are obvious from
package.json. Report only findings that could matter in production."

Codex

Add the checklist to AGENTS.md, then:

/review Use the code-review checklist in AGENTS.md. Flag only P0/P1/P2
issues: correctness, security, data safety, backward compatibility,
operational risk, and tests that would not catch real regressions.

When to Escalate to Human Review

AI review is useful, but it is not a replacement for ownership. Escalate to a senior human reviewer when the change touches:

authentication, authorization, billing, payments, or refunds
migrations that modify existing production data
permission models, sharing, tenancy, or admin features
public API contracts or SDK behavior
privacy, deletion, export, consent, or compliance paths
incident response, rate limiting, abuse prevention, or security headers

What’s Next

Use the checklist together with AI-Powered Code Review with /review when you want Codex to review a diff, and Security Scanning and Vulnerability Testing for deeper security passes.

C-Level AI Development Scorecard Find out whether your team is using AI as a production system or just a faster prototype generator.