Checklist code review dla kodu z AI

Kod wygenerowany przez AI powinien przechodzić taki sam standard review jak kod pisany ręcznie. Różnica polega na tym, że AI najczęściej psuje się w przewidywalnych miejscach: brak kontekstu, przypadkowe rozszerzenie zakresu, słabe testy, niebezpieczne założenia i zmiany, które wyglądają poprawnie, ale nie pasują do systemu. Użyj tej checklisty zanim poprosisz zespół o review AI-assisted pull requesta.

Jak używać tej checklisty

Przejdź przez nią w trzech przebiegach:

Zakres: upewnij się, że zmiana robi tylko to, o co proszono.
Poprawność: sprawdź zachowanie, testy, dane i tryby awarii.
Produkcja: sprawdź bezpieczeństwo, operacje, wydajność i reviewability.

Dla małych zmian zajmuje to pięć minut. Dla auth, billing, migracji, uprawnień albo danych klientów rób pełny przebieg za każdym razem.

50+ punktowa checklista

1. Zakres i intencja

PR ma jasny problem statement i kryteria sukcesu.
Diff dotyka tylko plików potrzebnych do tej zmiany.
Agent nie miesza feature worku z niepowiązanym refaktorem.
Nazwy, copy, komentarze i przykłady pasują do domeny produktu.
Implementacja trzyma się lokalnych wzorców zamiast wymyślać nową architekturę.
Nowe abstrakcje usuwają realną duplikację albo złożoność.
Opis PR wyjaśnia, czego celowo nie zmieniono.
Założenia są zapisane i łatwe do podważenia przez reviewera.

2. Dopasowanie do kontekstu

Agent przeczytał właściwe pliki przed edycją.
Zmiana respektuje granice modułów i ownership.
Publiczne interfejsy pozostają kompatybilne, chyba że PR wyraźnie mówi inaczej.
Nazewnictwo pasuje do konwencji w otaczającym kodzie.
Error handling pasuje do reszty codebase.
Logi, telemetry i analytics używają istniejących helperów.
Zmiana nie duplikuje istniejącego utility, typu, hooka, komponentu albo serwisu.

3. Poprawność

Happy path jest pokryty testem albo powtarzalnym manual checkiem.
Ważne edge case’y są pokryte: pusty input, null/undefined, limity, duplikaty, timeouty.
Implementacja obsługuje częściową awarię zamiast zakładać, że każda zależność działa.
Praca asynchroniczna jest awaitowana albo celowo fire-and-forget z obsługą lifecycle.
Przejścia stanów są jawne i nie mogą po cichu pominąć wymaganego kroku.
Logika dat, walut, locale i timezone jest deterministyczna.
Kod obsługuje retry, idempotency albo duplikaty eventów tam, gdzie to ważne.
Shape outputu pasuje do istniejących kontraktów API.
Zmiana była sprawdzona na realistycznych danych, nie tylko toy examples.

4. Testy

Testy opisują zachowanie, nie szczegóły implementacji.
Każdy nowy test faktycznie failowałby, gdyby feature był zepsuty.
Test suite pokrywa przynajmniej jeden failure path.
Mocki siedzą na granicach systemu, nie wewnątrz logiki, którą testujesz.
Snapshot updates są intencjonalne i przejrzane.
Uniknięto flaky waits, sleepów i zależności od sieci.
Właściwa lokalna komenda sprawdzająca została uruchomiona i wpisana do PR.

5. Bezpieczeństwo i prywatność

Nowe inputy są walidowane na granicy systemu.
Authorization jest sprawdzane oddzielnie od authentication.
Wartości kontrolowane przez usera nie trafiają niebezpiecznie do SQL, shell commands, ścieżek, HTML albo URL-i.
Secrety nie są logowane, zwracane, commitowane ani wystawione do client bundle.
Zmiana nie rozszerza permissions, OAuth scopes, CORS, CSP albo webhook trust bez wyjaśnienia.
Dane wrażliwe są redacted w logach, analytics i error messages.
Uploady, redirecty i callbacki są ograniczone do oczekiwanych originów i typów.

6. Dane i migracje

Zmiany schematu są kompatybilne podczas deploya.
Migracje są idempotentne albo mają jasną strategię rollbacku.
Istniejące dane są zachowane i transformowane intencjonalnie.
Nowe query używają indeksów albo bounded scans tam, gdzie wolumen ma znaczenie.
Background jobs i webhooki tolerują duplicate delivery.
Deletes są soft, odwracalne albo jasno uzasadnione.

7. Wydajność i operacje

Zmiana nie dodaje nieograniczonej pętli, N+1 query albo powtarzanego network calla.
Kosztowna praca jest cache’owana, batchowana, kolejkowana albo paginowana.
Client-side bundle nie rośnie przez łatwe do uniknięcia importy.
Błędy zawierają wystarczający kontekst do debugowania bez wycieku prywatnych danych.
Monitoring, alerting albo analytics istnieje dla nowych krytycznych ścieżek.
Feature failuje closed dla ścieżek security-sensitive i graceful dla ścieżek UX.

8. Reviewability

Diff da się przejrzeć w jednym podejściu.
Wygenerowany kod został uproszczony przed review.
Komentarze wyjaśniają nieoczywiste decyzje, a nie to, co kod robi wprost.
Churn czysto formattingowy jest odizolowany albo usunięty.
PR zawiera screenshots albo traces, gdy zachowanie jest wizualne lub operacyjne.
Człowiek potrafi wskazać najbardziej ryzykowną część zmiany w dwie minuty.

Prompt do skopiowania

Ten prompt jest samowystarczalny — niesie całą checklistę słowo w słowo, więc możesz wkleić go wprost do Cursora, Claude Code albo Codex przed otwarciem PR:

Review this AI-assisted change before I open a pull request.

Score the diff against the checklist below. Prioritize correctness, security,
data safety, backward compatibility, tests, and reviewability. Do not comment
on style unless it hides a real bug or makes the diff harder to review.

For each finding, report:
- severity: P0, P1, P2, or P3
- exact file and line if available
- why it matters in production
- the smallest safe fix

Checklist:

Scope and Intent
- The PR has a clear problem statement and success criteria.
- The diff only touches files needed for the requested change.
- The agent did not mix feature work with unrelated refactors.
- Generated names, copy, comments, and examples match the product domain.
- The implementation follows existing local patterns instead of inventing a new architecture.
- New abstractions remove real duplication or complexity, not just make the code look cleaner.
- The PR description explains what was intentionally not changed.
- Any assumptions are written down and easy for a reviewer to challenge.

Context Fit
- The agent read the relevant files before editing.
- The change respects existing module boundaries and ownership.
- Public interfaces remain backward-compatible unless the PR explicitly says otherwise.
- Naming matches surrounding code conventions.
- Error handling style matches the rest of the codebase.
- Logging, telemetry, and analytics use existing helpers.
- The change does not duplicate an existing utility, type, hook, component, or service.

Correctness
- Happy path behavior is covered by a test or a reproducible manual check.
- Important edge cases are covered: empty input, null/undefined, limits, duplicates, timeouts.
- The implementation handles partial failure instead of assuming every dependency succeeds.
- Async work is awaited or intentionally fire-and-forget with lifecycle handling.
- State transitions are explicit and cannot silently skip required steps.
- Date, currency, locale, and timezone logic is deterministic.
- The code handles retries, idempotency, or duplicate events where relevant.
- The output shape matches existing API contracts.
- The change has been tested against realistic data, not only toy examples.

Tests
- Tests describe behavior, not implementation details.
- Each new test would fail if the feature were broken.
- The test suite covers at least one failure path.
- Mocks sit at system boundaries, not inside the logic being tested.
- Snapshot updates are intentional and reviewed.
- Flaky waits, sleeps, or network dependencies were avoided.
- The relevant local check command was run and recorded in the PR.

Security and Privacy
- New inputs are validated at the boundary.
- Authorization is checked separately from authentication.
- User-controlled values are not interpolated into SQL, shell commands, paths, HTML, or URLs unsafely.
- Secrets are not logged, returned, committed, or exposed to the client bundle.
- The change does not broaden permissions, OAuth scopes, CORS, CSP, or webhook trust without explanation.
- Sensitive data is redacted in logs, analytics, and error messages.
- File uploads, redirects, and callbacks are constrained to expected origins and types.

Data and Migrations
- Schema changes are backward-compatible during deploy.
- Migrations are idempotent or have a clear rollback strategy.
- Existing data is preserved and transformed intentionally.
- New queries use indexes or bounded scans where volume matters.
- Background jobs and webhooks can tolerate duplicate delivery.
- Deletes are soft, recoverable, or explicitly justified.

Performance and Operations
- The change does not add an unbounded loop, N+1 query, or repeated network call.
- Expensive work is cached, batched, queued, or paginated where needed.
- Client-side bundles do not grow because of avoidable imports.
- Errors include enough context to debug without leaking private data.
- Monitoring, alerting, or analytics exists for new critical paths.
- The feature fails closed for security-sensitive paths and fails gracefully for UX paths.

Reviewability
- The diff is small enough to review in one sitting.
- Generated code was simplified before review.
- Comments explain non-obvious decisions, not what the code plainly does.
- Formatting-only churn is isolated or removed.
- The PR includes screenshots or traces when behavior is visual or operational.
- A human can identify the riskiest part of the change within two minutes.

Prompty per narzędzie

Prompt powyżej jest jednorazowy. Żeby checklista stała się stałym standardem review, zapisz ją raz jako plik reguł danego narzędzia, a potem odpalaj review, kiedy potrzebujesz — wtedy krótki prompt ma z czego korzystać (te 50+ punktów).

Cursor

Zapisz checklistę do .cursor/rules/code-review.mdc, a potem w czacie:

Review the current diff against the code-review rule. Flag accidental scope
creep, missing tests, unsafe assumptions, and changes that do not fit the
surrounding code. Suggest the smallest safe patch for each issue.

Claude Code

Wklej checklistę do CLAUDE.md (albo komendy /review), a potem:

claude "Review my uncommitted changes against the code-review checklist in
CLAUDE.md. Run the relevant test/typecheck commands if they are obvious from
package.json. Report only findings that could matter in production."

Codex

Dodaj checklistę do AGENTS.md, a potem:

/review Use the code-review checklist in AGENTS.md. Flag only P0/P1/P2
issues: correctness, security, data safety, backward compatibility,
operational risk, and tests that would not catch real regressions.

Kiedy eskalować do ludzkiego review

AI review jest przydatne, ale nie zastępuje ownership. Eskaluj do senior human reviewera, gdy zmiana dotyka:

authentication, authorization, billing, payments albo refunds
migracji modyfikujących istniejące dane produkcyjne
modeli uprawnień, sharingu, tenancy albo funkcji admin
publicznych kontraktów API albo SDK behavior
privacy, deletion, export, consent albo compliance
incident response, rate limiting, abuse prevention albo security headers

Co dalej

Używaj tej checklisty razem z AI-Powered Code Review with /review, gdy chcesz, żeby Codex przejrzał diff, oraz z Security Scanning and Vulnerability Testing dla głębszych przeglądów security.

C-Level AI Development Scorecard Sprawdź, czy Twój zespół używa AI jako produkcyjnego systemu, czy tylko jako szybszego generatora prototypów.