AI-PR labelling — distinguish + apply extra gates

Scorecard question (Q11 · Quality gates): Do you distinguish AI-authored PRs from human-authored ones (label, check, signing)? Max-score answer (3 pts): Label + extra gates (full test suite + security scan + 2× reviewer) on AI PRs. Why it matters: You cannot manage what you cannot see. AI PRs deserve different scrutiny — but only if you can identify them.

Why this matters in 2026

By mid-2026 the AI-authored share of pull requests in real engineering orgs sits between 30% and 60% — and on greenfield teams using Claude Code, Cursor, and Codex Cloud aggressively, it routinely tops 70%. The bug profile of AI-authored PRs is structurally different from human ones: CodeRabbit’s State of AI vs Human Code Generation report (December 2025) found AI-co-authored PRs ship 1.7× more issues (10.83 vs 6.45 per PR), with 1.4× more critical findings, 2.74× more cross-site-scripting findings, and roughly 2× more error-handling gaps. Industry telemetry from 2026: incidents per pull request jumped 23.5% and change failure rates climbed 30%, even though 75% of developers say they manually review AI code. The problem is not “AI writes bad code.” It is that AI writes code fast enough that any single reviewer becomes the bottleneck — and if you cannot tell AI PRs from human PRs at the gate, you cannot scale the differential scrutiny they need. The attribution question is now table stakes: Microsoft briefly defaulted VS Code’s git.addAICoAuthor setting to on (chatAndAgent) in late April 2026, then reverted it to off in May 2026 (1.119) after a faulty rollout mis-attributed human code to Copilot — so the Co-authored-by: Copilot <copilot@github.com> trailer is once again opt-in, but the format is unchanged when enabled. Claude Code attaches Co-Authored-By: Claude <noreply@anthropic.com> to every agent-authored commit. The signals are there. Q11 asks whether you’ve wired them into CI.

What “max score” actually looks like (auto-label + 4 extra gates)

A max-score Q11 setup is mechanical and boring. An AI-authored PR opens. Within 30 seconds, a GitHub Action inspects commit trailers, branch name, and PR body, and applies an ai-authored label. The crucial mechanical detail: GitHub does not let you make a required status check fire only when a label is present — a required check must run on every PR to be satisfiable. So the gating is inverted. You register one always-required check per gate in your ruleset, and the workflow behind each check reads the ai-authored label and decides whether to run the heavy version or short-circuit to a passing no-op. That single design choice routes every labelled PR through four extra gates beyond your standard pipeline while keeping the required checks always-reported:

Full test suite, not the smoke subset. Where a human PR might run only the changed-package test subset for speed, an AI PR runs unit + integration + e2e because hallucinated-code failure modes don’t respect package boundaries. The full-tests job runs on every PR but exits 0 immediately when the label is absent.
SAST + SCA security scan with a hard severity gate. Semgrep/CodeQL/Snyk Code plus Snyk/Socket SCA run with a fail-the-build threshold at HIGH inside the workflow — this is where the hard severity gate lives, because deterministic scanners support severity-based failure and the semantic AI reviewer does not. Hallucinated dependencies and copy-pasted vulnerable patterns are the failure modes this gate catches.
Two human reviewers required. Require two approvals on main (globally, or via a CODEOWNERS/team rule). Because GitHub applies required_approving_review_count per branch ruleset target — not per label — the “second reviewer for AI PRs” is enforced by a workflow that fails a required status check when a labelled PR has fewer than two approvals. The second reviewer is the structural defence against “first reviewer trusted the AI, AI trusted itself, nobody read the diff.”
AI-aware reviewer in the stack. Either Claude Code Action with a code-reviewer subagent, or a custom /ai-review slash command that runs harder checks (security patterns, hallucination detection, edge-case probing) than default /review.

In parallel, you publish the AI-author % metric: a weekly job queries merged PRs by the ai-authored label and writes the percentage to your dashboard alongside DORA. If AI-author % rises but revert rate stays flat, the gates work; if both rise together, you have a gate problem.

Lower tiers: 0 pts every PR is identical in CI. 1 pt developers self-label manually (unreliable). 2 pts auto-label exists but no differential gates. 3 pts auto-label + four gates + tracked metric.

Current landscape (web-search-verified)

Three things changed in 2025-2026 that make Q11 max score reachable without bespoke engineering: AI tools standardised on machine-readable attribution, the “always-required check that reads the label inside the workflow” pattern became the documented way to apply conditional gating on GitHub, and security scanners shipped first-class GitHub Action integrations that gate cleanly on severity.

Detection signals (Co-Authored-By trailers, PR-body footers, branch prefixes)

The attribution layer is now consistent across the three major coding agents. Claude Code adds Co-Authored-By: Claude <noreply@anthropic.com> to every agent-authored commit by default. GitHub Copilot writes Co-authored-by: Copilot <copilot@github.com> when the git.addAICoAuthor setting is enabled (it ships off by default as of VS Code 1.119, after the May 2026 revert — so don’t assume Copilot PRs carry the trailer unless your org turned it on). OpenAI Codex appends a configurable Co-authored-by trailer — default Codex <noreply@openai.com> — controlled by the commit_attribution key in .codex/config.toml; commit the file so the whole team gets consistent attribution (an empty string disables it). Secondary signals strengthen detection: agent workflows often include a generated-with-Claude-Code footer in the PR body, and parallel agent backlogs typically use branch prefixes claude/<slug> or codex/<slug>. The robust detection rule is a union: trailer regex (Co-Authored-By: (Claude|Copilot|Codex)) OR PR-body footer match OR branch prefix match. False-positive rate is near zero; false-negatives are the real risk, mitigated by adding signals as new tools emerge.

Auto-labeling via GitHub Actions or pre-push hook

The labelling itself is a 30-line GitHub Action on pull_request: [opened, synchronize]. It reads commits via gh api, regex-matches trailers, reads the PR body, reads the branch ref, and calls gh pr edit <PR#> --add-label ai-authored if any signal fires. Two design notes: run it on synchronize (a developer can amend a trailer after opening a human PR — you want the label to follow truth, not initial state), and make it idempotent. A client-side pre-push hook can mirror the logic for early signalling, but treat the GitHub Action as source of truth — local hooks can be disabled.

Required extra checks (full test suite, SAST scan, 2× human review)

GitHub does not support making a required status check conditional on a PR label — there is no “fire this required check only when ai-authored is present” toggle in rulesets or branch protection. A required check must always run to be satisfiable. The documented pattern is the inverse: register a fixed set of required checks on main, and push the label logic inside each workflow.

Always-required checks (run on every PR): lint, type-check, smoke tests, secret scan, plus the gate jobs full-tests, sast-scan, sca-scan, and ai-reviewers.
What the gate jobs do when ai-authored is present: full-tests runs unit + integration + e2e; sast-scan runs Semgrep/CodeQL with a severity: high fail threshold; sca-scan runs Snyk/Socket with a severity: high fail threshold; ai-reviewers fails if fewer than two approvals are present. When the label is absent, each job exits 0 immediately, so the required check is always reported green and the gate only bites on labelled PRs.

Each gate job triggers on pull_request, reads the label from the GitHub Actions context (or gh api repos/{owner}/{repo}/issues/{PR#}/labels), and branches on it. The cleanest 2026 GitHub Action integrations: Semgrep (run semgrep ci --config auto in the official semgrep/semgrep container, emitting SARIF — the old returntocorp/semgrep-action wrapper is deprecated), CodeQL (free on public repos, Advanced Security for private), Snyk Code (paid, broad language coverage), Socket (specialised in npm/PyPI supply-chain). Anthropic’s claude-code-security-review Action is worth stacking with a deterministic scanner — it catches semantic security issues (auth bypass via wrong middleware order, missing tenant filter) that pattern-based scanners miss — but note it only comments findings; it has no severity-fail input, so keep the hard severity gate on the deterministic scanners and treat the Anthropic action as a non-blocking semantic reviewer. The 2× reviewer rule catches the failure mode where the first reviewer rubber-stamps an AI PR because “it looked clean” — a fresh second pair of eyes forces re-reading.

Tracking AI-author % over time

The metric that makes Q11 legible to leadership is AI-author % of merged PRs, week over week. One gh api graphql call pulls merged PRs and their labels; filter to the last 7 days, classify by ai-authored, divide. Write to PostHog/Grafana/JSON and trend alongside DORA four. Interpretation: AI-author % rising with stable DORA = gates working. Rising with degrading change failure rate = gate problem. Flat with reported high AI usage = false-negative detection, audit the labelling. This metric feeds Q22 · AI metrics panel.

Step-by-step: building AI-PR detection + gating

Audit current state. Pull your last 50 merged PRs via gh pr list --state merged --limit 50 --json number,labels,author,title,commits. Count: how many have a Co-Authored-By: (Claude|Copilot|Codex) trailer in any commit? How many have an ai-authored (or similar) label? The gap between trailer count and label count is your current Q11 score. If labels = trailers, you’re at 2 pts. If labels = 0 and trailers > 0, you’re at 0-1 pts.
Pick the label name and document it. Standardise on ai-authored (or ai-assisted if you prefer softer phrasing). Document the rule in CONTRIBUTING.md: “PRs with any AI-tool-authored commit are labelled ai-authored automatically. This label triggers additional CI gates. Do not remove it manually.” Pre-create the label in every repo via gh label create ai-authored --color C00000 --description "Contains AI-authored commits — extra gates apply".
Write the auto-label GitHub Action. Create .github/workflows/ai-pr-label.yml triggered on pull_request: [opened, synchronize, reopened]. The job reads commits via gh api repos/{owner}/{repo}/pulls/{PR#}/commits, regex-matches the trailers, reads the PR body for the agent-generated footer, reads the branch name, and adds the label if any signal fires. Make it idempotent. Test it on a draft PR before rolling out.
Add the AI security review job. Wire anthropics/claude-code-security-review into a gate job. Pin it to a commit SHA (anthropics/claude-code-security-review@<sha>) rather than a tag — the repo publishes no release tags, and its README notes it is not hardened against prompt injection, so a SHA pin is the supply-chain-safe reference. It has no severity-fail input, so configure it to comment findings (comment-pr: true) and treat it as a non-blocking semantic reviewer. Run it in parallel with your deterministic SAST scanner — they catch different things, and the hard gate lives on the scanner, not here.
Add the deterministic SAST + SCA scanners. Pick one SAST (Semgrep is the lowest-friction starting point — run semgrep ci --config auto in the official semgrep/semgrep container; the returntocorp/semgrep-action / semgrep/semgrep-action wrappers are deprecated) and one SCA (Snyk or Socket). Each runs inside its gate job, reads the ai-authored label and runs the full scan when present, is configured with a severity: high fail threshold, and writes SARIF output that the GitHub Security tab ingests.
Configure branch protection with always-required gate checks. Open the repo’s Rulesets settings and create a ruleset targeting main. Under “Required status checks,” list every gate job by its check name: lint, type-check, smoke tests, secret scan, full-tests, sast-scan, sca-scan, and ai-reviewers. Do not look for a “conditional checks” option — it does not exist; the conditionality lives inside each workflow (heavy run when labelled, instant pass when not). Set required_approving_review_count to 2 at the branch level (or enforce the second reviewer for labelled PRs via the ai-reviewers job). Save and enable.
Wire an AI-aware human reviewer route. Update CODEOWNERS so that PRs labelled ai-authored automatically request review from a designated “AI PR reviewers” team — a rotating roster of senior engineers trained to read AI-authored diffs for the specific failure modes (hallucinated APIs, copy-pasted vulnerable patterns, missing null/edge handling). The team can be as small as 3 people; rotate weekly so no one burns out.
Add a custom /ai-review slash command. Create .claude/commands/ai-review.md with a prompt that explicitly tells the reviewer the diff is AI-authored and lists the AI-specific failure modes to check (hallucinated function signatures, hallucinated imports, missing input validation, security regressions in changed auth/DB code, copy-pasted vulnerable snippets). This becomes the second AI-reviewer status check on ai-authored PRs.
Wire the AI-author % metric. Write a weekly cron job (GitHub Action on schedule, or a small Cloudflare Worker, or a Vercel cron) that queries the GraphQL API, computes AI-author %, and posts to your metrics store (PostHog event, Grafana datasource, or a metrics-repo JSON file your dashboard reads). Set a Slack alert when AI-author % shifts more than 10pp week over week — it signals either adoption acceleration or a labelling regression.
Roll out to one repo, then expand. Pilot the full stack in your highest-volume repo for two weeks. Watch labelling accuracy, CI runtime impact (extra gates add 3-8 minutes typically), and reviewer fatigue. Tune. Then export workflow files and ruleset config as reusable templates and roll to every repo.

Common pitfalls

Relying on developer discipline to add the label. Symptom: half your AI-authored PRs are unlabelled because developers forgot. Fix: automate the label via Action with pull_request: [opened, synchronize] triggers; never trust manual labelling for policy-critical decisions.
No branch protection enforcing the extra gates. Symptom: the ai-authored label is correct but the gates are “optional” — developers can merge without the SAST scan green. Fix: the entire value of Q11 is in branch protection. Without it, the label is decoration. Register the gate jobs (full-tests, sast-scan, sca-scan, ai-reviewers) as always-required checks in a ruleset on main and verify by attempting to merge a labelled PR with the gate red — it should be blocked.
No metrics on AI-author %. Symptom: you have the label and the gates, but you cannot answer “what fraction of last quarter’s PRs were AI-authored?” Fix: the AI-author % time series is the leading indicator that ties Q11 to Q22 and ultimately Q23 (ROI). Without the metric, you cannot show leadership that the gates are scaling with AI adoption or that adoption is rising.
Treating Co-Authored-By trailers as the only signal. Symptom: a developer pastes Claude output into Cursor without the trailer being added; the PR ships unlabelled. Fix: union of signals — trailer OR PR body footer OR branch prefix. New tools emerge; revisit the detection signals every quarter.
Punitive framing of the label. Symptom: developers strip the trailer to avoid the extra gates. Fix: frame the label as engineering hygiene, not blame. The extra gates are how the org trusts AI-authored work, not how it punishes the author. The reviewer roster is rotating senior engineers, not a quality-cop team. Pair the policy rollout with an explicit message that AI-authoring is encouraged — what’s required is the visibility, not the abstention.
Single SAST/SCA tool with no AI security reviewer. Symptom: deterministic scanners miss semantic security issues (auth bypass, tenant leakage) that an AI reviewer would catch. Fix: stack Semgrep/CodeQL (pattern-based) with claude-code-security-review (semantic). They catch different classes of issues and the false-positive overlap is low.
--admin overrides on AI-authored PRs. Symptom: a senior overrides the failing gate “to ship Friday’s release.” Fix: the entire policy collapses on the first --admin override. Make the override audit-logged and reviewed weekly. The exception process should be a documented escalation, not a flag a senior can fire from CLI.
No re-evaluation when AI tooling changes. Symptom: a new agent ships with a different trailer format and your detection misses it for a month. Fix: review detection signals every quarter. Subscribe to release notes for Claude Code, Copilot, Codex CLI, and Cursor. Update the regex.

How to verify you’re there

A pre-existing GitHub label ai-authored exists in every repo with a documented description.
.github/workflows/ai-pr-label.yml (or equivalent) auto-applies the label on PR open + synchronize and is idempotent.
The label fires correctly for PRs with Co-Authored-By: (Claude|Copilot|Codex) trailers, agent-generated PR body footers, and claude//codex/ branch prefixes.
A ruleset on main lists full-tests, sast-scan, sca-scan, and ai-reviewers as always-required status checks; each job runs the heavy path only when ai-authored is present and instant-passes otherwise, and required_approving_review_count is 2 (branch-level, or enforced for labelled PRs by ai-reviewers).
A merge attempt on a labelled PR with any gate job reporting red (heavy path failed) is blocked at the branch-protection layer.
CODEOWNERS routes labelled PRs to a designated AI-PR reviewers team (rotating roster).
.claude/commands/ai-review.md exists and is triggered on labelled PRs.
A weekly metric of AI-author % is published to your metrics store and visible on the engineering dashboard alongside DORA four.
Over the last quarter, change failure rate on AI-authored PRs is at or below human-authored baseline — measurable proof the gates are doing work.
No --admin overrides on AI-authored PR gates in the last 30 days (or any override is reviewed in a weekly engineering forum).