AI-Powered Code Quality Gates

A null slipped past review, shipped on Friday, and paged you at 2am. The diff looked fine — three reviewers approved it — but nobody noticed the unguarded response.data.user.id on a path that only fires for SSO logins. The retro question is brutal and fair: “How did four humans miss this?”

They missed it because human review is the wrong tool for mechanical defects. Style, type holes, N+1 queries, missing input validation, and unhandled rejections are exactly what an AI agent wired into a quality gate catches every time, before a human ever opens the PR. This article shows how to build that gate across Cursor, Claude Code, and Codex so reviewers spend their attention on architecture and intent instead of playing linter.

What You’ll Walk Away With

A shared standards file (.cursor/rules, CLAUDE.md, or AGENTS.md) that every agent enforces consistently
A working Claude Code hook that runs Prettier, ESLint, and tsc --noEmit on every file the agent edits — with the correct event-keyed schema and stdin file path
A headless AI PR-review step in GitHub Actions you can drop into any repo, on all three tools
Three copy-paste review prompts: a stack-aware PR audit, an any-to-typed-interface refactor, and a k6 load test with real thresholds
A failure-mode playbook for when the gate gets noisy, blocks CI on unrelated files, or hits diff limits

How Quality Gates Layer Up

You want defects caught as early and as cheaply as possible. That means three layers, each catching what the previous one let through:

Development-time — the agent fixes lint/type errors as it writes, inside the editor or hook loop. Cheapest possible feedback.
Pre-merge — a headless agent reviews the diff in CI and posts findings on the PR before a human looks.
Continuous — a tool like SonarQube tracks coverage, complexity, and duplication trends so quality regressions show up as a graph, not a surprise.

The rest of this article builds each layer. Layers 1 and 2 are where the three tools differ, so they use <Tabs>.

Layer 1: Shared Standards the Agent Enforces

All three tools read a project-level rules file and apply it to everything they generate. The file format and location differ; the content is nearly identical. Keep it in version control so the whole team — and every agent — works from the same standard.

---
description: Enterprise Code Quality Standards
alwaysApply: true
---
## Style
- 2-space indentation, max line length 100
- Every exported function has a JSDoc block
- No `any` without a `// eslint-disable-next-line` and a reason

## Architecture
- Data access goes through the repository layer, never inline SQL in handlers
- Services receive dependencies via constructor injection
- All outbound HTTP calls go through the shared `httpClient` wrapper

## Performance
- Paginate any endpoint that returns a list
- No queries inside loops — batch with `IN (...)` or a join
- Memoize pure functions that run on every render

## Security
- Parameterized queries only
- Validate request bodies with the Zod schema in `schemas/`
- Never log tokens, passwords, or full request bodies

## Coding Standards

### Style
- ESLint config: `.eslintrc.json`; Prettier: `.prettierrc`
- TypeScript strict mode; no `any` without an inline justification comment
- No `console.log` in committed code — use the `logger` module

### Quality gates
- Coverage floor: 80% on changed lines
- Cyclomatic complexity limit: 10 (enforced by `eslint-plugin-complexity`)
- Every TODO references a ticket: `// TODO(PROJ-1234): ...`

### Before you finish a task
- Run `npm run lint && npm run typecheck && npm test`
- Add or update tests for new behavior
- Update the relevant doc in `docs/` if you changed a public API

## Project standards

Codex reads AGENTS.md from the repo root (and merges nested ones in
subdirectories). Same rules as the other tools — keep them in sync.

### Style
- 2-space indentation, max line length 100, Prettier-formatted
- TypeScript strict; no `any` without a justification comment
- Use `typescript-eslint` rules, not legacy formatting lint

### Quality gates
- 80% coverage on changed lines; complexity limit 10
- Parameterized queries only; validate inputs with Zod
- Run `npm run lint && npm run typecheck && npm test` before declaring done

Layer 1, Automated: A Claude Code Hook That Actually Loads

The most common mistake here is a hook config that silently never runs. Claude Code nests hook arrays under an event name (PostToolUse, PreToolUse) inside the top-level hooks object — a bare top-level hooks array will not load. And hooks do not receive the edited path in an environment variable; they read JSON on stdin and pull .tool_input.file_path.

Put the logic in a script so the config stays readable:

{
  "hooks": {
    "PostToolUse": [
      {
        "matcher": "Edit|Write",
        "hooks": [
          {
            "type": "command",
            "command": "\"$CLAUDE_PROJECT_DIR\"/.claude/hooks/format-and-lint.sh"
          }
        ]
      }
    ]
  }
}

#!/usr/bin/env bash
set -euo pipefail

# The edited path arrives as JSON on stdin, not as an env var.
FILE_PATH=$(jq -r '.tool_input.file_path // empty')
[ -z "$FILE_PATH" ] && exit 0

npx prettier --write "$FILE_PATH"
npx eslint --fix "$FILE_PATH"

# Type-check only TS files; tsc does the type checking, not a linter.
case "$FILE_PATH" in
  *.ts|*.tsx) npx tsc --noEmit ;;
esac

$CLAUDE_PROJECT_DIR is one of the few real hook variables (alongside $CLAUDE_ENV_FILE for SessionStart and $CLAUDE_CODE_REMOTE). Wrap it in quotes so paths with spaces survive.

In Cursor, the equivalent is the auto-fix loop: when ESLint errors land in the Problems panel, the agent fixes them and re-runs until clean. Codex applies the same eslint --fix step inside its sandbox when you ask it to “make lint pass” as part of a task.

Layer 1: Linters Worth Wiring In

JavaScript / TypeScript

ESLint with your shared config
Prettier for formatting
typescript-eslint for TypeScript-aware lint rules
tsc --noEmit for type checking

Python

Ruff for fast linting (and formatting, replacing Black)
mypy for type checking
bandit for security lint

Java

Checkstyle for standards
SpotBugs for bug detection
PMD for code analysis

golangci-lint aggregator
gofmt for formatting
go vet plus staticcheck

Layer 2: AI PR Review Before a Human Looks

This is where the gate earns its keep. The setup is genuinely three-tool: each runs a headless agent against the PR diff and posts findings.

Wire up the GitHub integration

Cursor’s built-in PR review is BugBot. Enable it from the dashboard’s GitHub integration, then drop a .cursor/BUGBOT.md at the repo root to steer what it flags (see the review guidelines below). BugBot comments inline on the PR automatically once connected.

Add the GitHub MCP server (remote HTTP — there is no built-in github shorthand, and the transport plus URL are required):

claude mcp add --transport http github https://api.githubcopilot.com/mcp/
# Auth via OAuth on first use, or pass a token:
#   --header "Authorization: Bearer $GITHUB_PAT"

For PR automation specifically, install the GitHub App so Claude can be mentioned on PRs:

/install-github-app

Use Codex Cloud code review: connect the repo in the Codex Cloud dashboard and enable automatic review on pull requests. Codex reads AGENTS.md for your standards and posts review comments. For ad-hoc local review, run the headless codex exec step shown in the next section.

Review guidelines the agent reads

Both Cursor’s .cursor/BUGBOT.md and a prompt fed to Claude Code or Codex benefit from an explicit checklist. Keep it focused on what humans reliably miss:

# .cursor/BUGBOT.md  (or paste into the review prompt)

## Security (block on any of these)
- Hardcoded credentials, tokens, or API keys
- Unparameterized SQL or string-concatenated queries
- Unvalidated request bodies reaching the database
- Missing auth check on a protected route
- User input rendered without escaping (XSS)

## Correctness
- Unhandled promise rejections / missing `await`
- Null/undefined dereferences on optional fields
- N+1 query patterns (a query inside a `.map`/loop)

## Quality
- New code without tests
- Functions over 50 lines or complexity over 10
- Logging that includes sensitive data

Headless review in CI

Drop this into any repo. It runs on pull requests and posts the agent’s findings. Note actions/checkout@v6 — @v3 is deprecated and forces JavaScript actions onto an unsupported Node runtime.

BugBot runs as a hosted GitHub integration, so there is no CI YAML to maintain — it reviews PRs automatically once enabled. Use the Claude Code or Codex tab if you want the review step to live in your own workflow file instead.

name: AI Code Review
on: [pull_request]

jobs:
  review:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v6
        with:
          fetch-depth: 0
      - name: AI review
        env:
          ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
        run: |
          git diff origin/${{ github.base_ref }}...HEAD > diff.patch
          claude -p "Review the diff in diff.patch against .cursor/BUGBOT.md. \
            Report only real defects as 'file:line — issue — fix', \
            grouped by Security / Correctness / Quality. \
            If nothing is wrong, say 'No blocking issues.'" \
            --output-format json > review.json

name: AI Code Review
on: [pull_request]

jobs:
  review:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v6
        with:
          fetch-depth: 0
      - name: AI review
        env:
          CODEX_API_KEY: ${{ secrets.OPENAI_API_KEY }}
        run: |
          git diff origin/${{ github.base_ref }}...HEAD > diff.patch
          codex exec --sandbox read-only -c approval_policy=never \
            "Review diff.patch against AGENTS.md. Report only real \
             defects as 'file:line — issue — fix', grouped by \
             Security / Correctness / Quality."

This non-interactive review cannot surface a new approval prompt. Use approval_policy=never only with a trusted, least-privilege CI identity; read-only remains the enforced sandbox boundary and any action requiring more access fails.

Copy-paste prompt for a stack-aware PR review (paste into Cursor agent, claude -p, or codex exec):

You are reviewing a pull request for a Node.js + TypeScript service using
Express, Drizzle ORM (Postgres), and Zod for validation.

Review only the changed lines in this diff:
$(git diff origin/main...HEAD)

Flag, with file:line and a one-line fix for each:
1. SQL injection or any query not going through Drizzle's parameterized API
2. N+1 query patterns (a DB call inside a loop or .map)
3. Request handlers that touch req.body without a Zod parse
4. Unhandled promise rejections or missing await
5. Auth middleware missing on a route under /api/admin

Skip style nits — Prettier and ESLint already handle those.
Output "No blocking issues" if you find none.

Cleaning Up as the Agent Goes

The single most useful in-editor habit is killing any the moment it appears. Cursor’s auto-fix loop does this when typed, but the prompt works in all three tools.

Copy-paste prompt to replace any with a real type:

This value is typed `any`:

  const data = response.data as any;

Infer a precise interface from how `data` is used in the surrounding
function (the property accesses, the JSON shape this endpoint returns).
Define an exported interface, replace the `any` cast with it, and update
any call sites that now type-check more strictly. Do not use `unknown`
as a cop-out unless the shape is genuinely dynamic.

A clean result looks like this — the cast becomes a named, checkable contract:

interface UserResponse {
  id: string;
  status: 'active' | 'inactive';
  metadata: Record<string, unknown>;
}

const data = response.data as UserResponse;

Layer 3: Continuous Quality Monitoring

Real monitoring means real tooling, not a function that returns hardcoded numbers. SonarQube (or SonarCloud) is the standard: it computes coverage, cyclomatic complexity, and duplication on every build and tracks the trend. Wire it into the same workflow:

# add to .github/workflows/ai-review.yml
      - name: SonarQube scan
        uses: SonarSource/sonarqube-scan-action@v6
        env:
          SONAR_TOKEN: ${{ secrets.SONAR_TOKEN }}
        with:
          args: >
            -Dsonar.qualitygate.wait=true

The qualitygate.wait=true flag blocks the PR if the project’s Sonar quality gate fails (for example, coverage on new code below 80% or a new blocker-severity issue). That is your enforcement point — concrete, measured, and not something an agent can fake.

For the “what’s the AI’s read on this?” angle, feed Sonar’s findings to the agent rather than asking it to invent metrics:

Copy-paste prompt to triage a Sonar report:

Here is the SonarQube "new code" report for this PR (JSON below). For each
issue, tell me: is it a real risk worth fixing now, a fix-later, or a false
positive given our codebase? Group by severity and give a one-line rationale
per issue — do not just restate Sonar's message.

<paste sonar issues JSON>

Performance as a Quality Gate

A common production regression is a query or endpoint that works fine in review and falls over under load. Bake load testing into the gate with k6 — the thresholds are real and make the test pass/fail on its own.

Copy-paste prompt to generate a k6 load test:

Write a k6 load test for our POST /api/checkout endpoint.

- Ramp to 200 virtual users over 2 minutes, hold for 5 minutes, ramp down
- Send a realistic JSON body: { cartId, paymentMethodId, idempotencyKey }
- Thresholds that FAIL the test:
    - http_req_duration p(95) must be < 500ms
    - http_req_failed rate must be < 0.01
- Read the base URL from the BASE_URL env var
- Tag requests so the checkout endpoint is isolated in the summary

The generated test encodes the thresholds as gate conditions, so a regression turns the CI step red:

import http from 'k6/http';
import { check } from 'k6';

export const options = {
  stages: [
    { duration: '2m', target: 200 },
    { duration: '5m', target: 200 },
    { duration: '2m', target: 0 },
  ],
  thresholds: {
    http_req_duration: ['p(95)<500'],
    http_req_failed: ['rate<0.01'],
  },
};

export default function () {
  const res = http.post(
    `${__ENV.BASE_URL}/api/checkout`,
    JSON.stringify({ cartId: 'c_1', paymentMethodId: 'pm_1', idempotencyKey: `${__VU}-${__ITER}` }),
    { headers: { 'Content-Type': 'application/json' }, tags: { name: 'checkout' } },
  );
  check(res, { 'status 200': (r) => r.status === 200 });
}

Custom Review Commands

Once the prompts above prove useful, save them as reusable slash commands. A file at .claude/commands/security-audit.md becomes the /security-audit command inside an interactive Claude Code session (subdirectories add namespacing — .claude/commands/review/pr.md is /review:pr). Invoke it in the REPL:

> /security-audit

with the command file holding your OWASP-focused prompt. Cursor exposes the same idea through saved prompts; Codex through AGENTS.md workflows and custom prompts.

When This Breaks

What’s Next

Performance Testing — load and stress testing strategies in depth
Security Compliance — security-focused development workflows
CI/CD Pipelines — wiring these gates into your full pipeline
Testing Excellence — comprehensive testing strategies across tools