Skip to content

AI-Powered Code Quality Gates

A null slipped past review, shipped on Friday, and paged you at 2am. The diff looked fine — three reviewers approved it — but nobody noticed the unguarded response.data.user.id on a path that only fires for SSO logins. The retro question is brutal and fair: “How did four humans miss this?”

They missed it because human review is the wrong tool for mechanical defects. Style, type holes, N+1 queries, missing input validation, and unhandled rejections are exactly what an AI agent wired into a quality gate catches every time, before a human ever opens the PR. This article shows how to build that gate across Cursor, Claude Code, and Codex so reviewers spend their attention on architecture and intent instead of playing linter.

  • A shared standards file (.cursor/rules, CLAUDE.md, or AGENTS.md) that every agent enforces consistently
  • A working Claude Code hook that runs Prettier, ESLint, and tsc --noEmit on every file the agent edits — with the correct event-keyed schema and stdin file path
  • A headless AI PR-review step in GitHub Actions you can drop into any repo, on all three tools
  • Three copy-paste review prompts: a stack-aware PR audit, an any-to-typed-interface refactor, and a k6 load test with real thresholds
  • A failure-mode playbook for when the gate gets noisy, blocks CI on unrelated files, or hits diff limits

You want defects caught as early and as cheaply as possible. That means three layers, each catching what the previous one let through:

  1. Development-time — the agent fixes lint/type errors as it writes, inside the editor or hook loop. Cheapest possible feedback.
  2. Pre-merge — a headless agent reviews the diff in CI and posts findings on the PR before a human looks.
  3. Continuous — a tool like SonarQube tracks coverage, complexity, and duplication trends so quality regressions show up as a graph, not a surprise.

The rest of this article builds each layer. Layers 1 and 2 are where the three tools differ, so they use <Tabs>.

Layer 1: Shared Standards the Agent Enforces

Section titled “Layer 1: Shared Standards the Agent Enforces”

All three tools read a project-level rules file and apply it to everything they generate. The file format and location differ; the content is nearly identical. Keep it in version control so the whole team — and every agent — works from the same standard.

.cursor/rules/code-standards.mdc
---
description: Enterprise Code Quality Standards
alwaysApply: true
---
## Style
- 2-space indentation, max line length 100
- Every exported function has a JSDoc block
- No `any` without a `// eslint-disable-next-line` and a reason
## Architecture
- Data access goes through the repository layer, never inline SQL in handlers
- Services receive dependencies via constructor injection
- All outbound HTTP calls go through the shared `httpClient` wrapper
## Performance
- Paginate any endpoint that returns a list
- No queries inside loops — batch with `IN (...)` or a join
- Memoize pure functions that run on every render
## Security
- Parameterized queries only
- Validate request bodies with the Zod schema in `schemas/`
- Never log tokens, passwords, or full request bodies

Layer 1, Automated: A Claude Code Hook That Actually Loads

Section titled “Layer 1, Automated: A Claude Code Hook That Actually Loads”

The most common mistake here is a hook config that silently never runs. Claude Code nests hook arrays under an event name (PostToolUse, PreToolUse) inside the top-level hooks object — a bare top-level hooks array will not load. And hooks do not receive the edited path in an environment variable; they read JSON on stdin and pull .tool_input.file_path.

Put the logic in a script so the config stays readable:

.claude/settings.json
{
"hooks": {
"PostToolUse": [
{
"matcher": "Edit|Write",
"hooks": [
{
"type": "command",
"command": "\"$CLAUDE_PROJECT_DIR\"/.claude/hooks/format-and-lint.sh"
}
]
}
]
}
}
.claude/hooks/format-and-lint.sh
#!/usr/bin/env bash
set -euo pipefail
# The edited path arrives as JSON on stdin, not as an env var.
FILE_PATH=$(jq -r '.tool_input.file_path // empty')
[ -z "$FILE_PATH" ] && exit 0
npx prettier --write "$FILE_PATH"
npx eslint --fix "$FILE_PATH"
# Type-check only TS files; tsc does the type checking, not a linter.
case "$FILE_PATH" in
*.ts|*.tsx) npx tsc --noEmit ;;
esac

$CLAUDE_PROJECT_DIR is one of the few real hook variables (alongside $CLAUDE_ENV_FILE for SessionStart and $CLAUDE_CODE_REMOTE). Wrap it in quotes so paths with spaces survive.

In Cursor, the equivalent is the auto-fix loop: when ESLint errors land in the Problems panel, the agent fixes them and re-runs until clean. Codex applies the same eslint --fix step inside its sandbox when you ask it to “make lint pass” as part of a task.

JavaScript / TypeScript

  • ESLint with your shared config
  • Prettier for formatting
  • typescript-eslint for TypeScript-aware lint rules
  • tsc --noEmit for type checking

Python

  • Ruff for fast linting (and formatting, replacing Black)
  • mypy for type checking
  • bandit for security lint

Java

  • Checkstyle for standards
  • SpotBugs for bug detection
  • PMD for code analysis

Go

  • golangci-lint aggregator
  • gofmt for formatting
  • go vet plus staticcheck

Layer 2: AI PR Review Before a Human Looks

Section titled “Layer 2: AI PR Review Before a Human Looks”

This is where the gate earns its keep. The setup is genuinely three-tool: each runs a headless agent against the PR diff and posts findings.

Cursor’s built-in PR review is BugBot. Enable it from the dashboard’s GitHub integration, then drop a .cursor/BUGBOT.md at the repo root to steer what it flags (see the review guidelines below). BugBot comments inline on the PR automatically once connected.

Both Cursor’s .cursor/BUGBOT.md and a prompt fed to Claude Code or Codex benefit from an explicit checklist. Keep it focused on what humans reliably miss:

# .cursor/BUGBOT.md (or paste into the review prompt)
## Security (block on any of these)
- Hardcoded credentials, tokens, or API keys
- Unparameterized SQL or string-concatenated queries
- Unvalidated request bodies reaching the database
- Missing auth check on a protected route
- User input rendered without escaping (XSS)
## Correctness
- Unhandled promise rejections / missing `await`
- Null/undefined dereferences on optional fields
- N+1 query patterns (a query inside a `.map`/loop)
## Quality
- New code without tests
- Functions over 50 lines or complexity over 10
- Logging that includes sensitive data

Drop this into any repo. It runs on pull requests and posts the agent’s findings. Note actions/checkout@v6@v3 is deprecated and forces JavaScript actions onto an unsupported Node runtime.

BugBot runs as a hosted GitHub integration, so there is no CI YAML to maintain — it reviews PRs automatically once enabled. Use the Claude Code or Codex tab if you want the review step to live in your own workflow file instead.

The single most useful in-editor habit is killing any the moment it appears. Cursor’s auto-fix loop does this when typed, but the prompt works in all three tools.

A clean result looks like this — the cast becomes a named, checkable contract:

interface UserResponse {
id: string;
status: 'active' | 'inactive';
metadata: Record<string, unknown>;
}
const data = response.data as UserResponse;

Real monitoring means real tooling, not a function that returns hardcoded numbers. SonarQube (or SonarCloud) is the standard: it computes coverage, cyclomatic complexity, and duplication on every build and tracks the trend. Wire it into the same workflow:

# add to .github/workflows/ai-review.yml
- name: SonarQube scan
uses: SonarSource/sonarqube-scan-action@v6
env:
SONAR_TOKEN: ${{ secrets.SONAR_TOKEN }}
with:
args: >
-Dsonar.qualitygate.wait=true

The qualitygate.wait=true flag blocks the PR if the project’s Sonar quality gate fails (for example, coverage on new code below 80% or a new blocker-severity issue). That is your enforcement point — concrete, measured, and not something an agent can fake.

For the “what’s the AI’s read on this?” angle, feed Sonar’s findings to the agent rather than asking it to invent metrics:

A common production regression is a query or endpoint that works fine in review and falls over under load. Bake load testing into the gate with k6 — the thresholds are real and make the test pass/fail on its own.

The generated test encodes the thresholds as gate conditions, so a regression turns the CI step red:

checkout.load.test.js
import http from 'k6/http';
import { check } from 'k6';
export const options = {
stages: [
{ duration: '2m', target: 200 },
{ duration: '5m', target: 200 },
{ duration: '2m', target: 0 },
],
thresholds: {
http_req_duration: ['p(95)<500'],
http_req_failed: ['rate<0.01'],
},
};
export default function () {
const res = http.post(
`${__ENV.BASE_URL}/api/checkout`,
JSON.stringify({ cartId: 'c_1', paymentMethodId: 'pm_1', idempotencyKey: `${__VU}-${__ITER}` }),
{ headers: { 'Content-Type': 'application/json' }, tags: { name: 'checkout' } },
);
check(res, { 'status 200': (r) => r.status === 200 });
}

Once the prompts above prove useful, save them as reusable slash commands. A file at .claude/commands/security-audit.md becomes the /security-audit command inside an interactive Claude Code session (subdirectories add namespacing — .claude/commands/review/pr.md is /review:pr). Invoke it in the REPL:

> /security-audit

with the command file holding your OWASP-focused prompt. Cursor exposes the same idea through saved prompts; Codex through AGENTS.md workflows and custom prompts.