Skip to content

Agent hooks — the only way to enforce 'every time X'

Q13 · Extensibility How many hooks do you have active (Stop, PreToolUse, PostToolUse, SessionStart…)?

Max-score answer: “5+ orchestrated: auto-PR-watch, fix-on-review, exposing MCP tools inside hooks.”

Hooks are the only place in your AI setup where rules become guarantees. Everything else — CLAUDE.md, memory files, system prompts, slash commands — is a suggestion the model may follow, paraphrase, or quietly forget six turns into the loop. A hook is shell-level: it runs unconditionally, the harness blocks or proceeds based on its exit code, and the model has no vote. If you find yourself writing “always run the formatter after editing” or “every time you finish a feature, open a PR” into your memory file and wishing it would stick — you don’t need a better prompt, you need a hook.

In 2026 the hook surface has finally caught up with the ambition. Claude Code ships 12+ lifecycle events. Cursor matches Claude’s event set and adds two of its own — beforeShellExecution and beforeMCPExecution — covering the two riskiest tool surfaces. Codex CLI still leans on its older notify event for desktop alerts but rolled out a proper hooks.json system in early 2026. The walls between “what the agent wants to do” and “what you allow” are now porous in your favour — but only if you build the hooks.

A senior IC running 4–8 agent hours a day without hooks leaks 30–60 minutes daily on things hooks automate for free: re-running the formatter, fixing import order, catching rm -rf typos before they hit disk, hand-opening the PR after every feature. The “5+ orchestrated” bar isn’t hook-count vanity — it’s the threshold where your harness starts removing whole categories of babysitting from your day.

You get full marks on Q13 only when all four of these are true:

  • At least 5 hooks active across the agent lifecycle. Not 5 commands chained in one hook — 5 distinct entries firing on different events (SessionStart, UserPromptSubmit, PreToolUse, PostToolUse, Stop, or Cursor equivalents). A real spread, not a single overloaded PostToolUse.
  • At least one hook is orchestrated — it sets state the next hook reads. Classic example: a PostToolUse hook records “feature complete” markers; a Stop hook reads them and, if conditions match, pushes a branch and opens a PR. This is the leap from “hook” to “hook stack.”
  • At least one bot/review-loop hook. Either auto-PR-on-stop, fix-on-review (Stop hook polls the open PR for new CodeRabbit/Copilot/human comments and wakes the agent to address them), or a PostToolUse that re-runs the failing tests and pings you on Slack. Hooks that turn the agent into a worker that keeps working when you walk away.
  • At least one hook touches MCP. Cursor’s beforeMCPExecution is the obvious case (audit/gate every MCP call), but the same shape works in Claude Code — a PreToolUse matcher on mcp__* that logs to Honeycomb/Langfuse, requires approval for high-risk MCP servers (Stripe, GitHub-write, AWS), or rewrites tool inputs before they reach the server.

Anything less — “I have one Stop hook that plays a sound” — is mid-tier on Q13.

Claude Code’s hook surface (official docs) is the richest of any agent CLI in 2026. Events fall into three cadences:

  • Once per session: SessionStart, SessionEnd — inject environment context, pre-warm caches, log session metadata, or chime when you’re back at the keyboard.
  • Once per turn: UserPromptSubmit, Stop, StopFailure, Notification. UserPromptSubmit is the only hook that can block or enrich a user prompt before Claude sees it. Stop fires when Claude finishes generating; it can prevent stopping and force the agent to keep working.
  • Once per tool call: PreToolUse, PostToolUse, SubagentStop, plus MCP-specific events. PreToolUse is the only hook that can hard-block a tool (exit code 2 stops the call); PostToolUse runs after a successful call with both input and response on stdin.

Configuration lives in ~/.claude/settings.json (global) or .claude/settings.json (per-project, checked into the repo). Real shape:

{
"hooks": {
"PreToolUse": [
{
"matcher": "Bash",
"hooks": [
{
"type": "command",
"command": "$CLAUDE_PROJECT_DIR/.claude/hooks/deny-dangerous.sh"
}
]
},
{
"matcher": "mcp__github__.*",
"hooks": [
{
"type": "command",
"command": "echo \"GitHub MCP: $(jq -r '.tool_name')\" >> ~/.claude/logs/mcp-audit.log"
}
]
}
],
"PostToolUse": [
{
"matcher": "Edit|Write",
"hooks": [
{
"type": "command",
"command": "$CLAUDE_PROJECT_DIR/.claude/hooks/format-and-typecheck.sh"
}
]
}
],
"Stop": [
{
"hooks": [
{
"type": "command",
"command": "$CLAUDE_PROJECT_DIR/.claude/hooks/auto-pr-watch.sh"
}
]
}
],
"SessionStart": [
{
"hooks": [
{
"type": "command",
"command": "echo \"session=$CLAUDE_SESSION_ID branch=$(git rev-parse --abbrev-ref HEAD)\" >&2"
}
]
}
]
}
}

The matcher field is a regex run against the tool name (Bash, Edit, Write, Read, Glob, Grep, Task, or mcp__<server>__<tool> for MCP). Exit code 2 from a PreToolUse hook blocks the tool with a feedback message to Claude; exit code 0 lets it proceed; any other non-zero is a hook error and Claude logs it but doesn’t block.

Cursor shipped its hook system in late 2025 with the same event names Claude Code uses, plus two that cover Cursor’s specific surfaces (Cursor docs):

  • beforeShellExecution — fires before the agent runs any shell command. Matcher runs against the command string. Set failClosed: true to block on hook error.
  • beforeMCPExecution — fires before any MCP tool call. The hook receives the server command, tool_name, and JSON-encoded tool_input. The cleanest place in any 2026 setup to audit, gate, or rewrite MCP calls — particularly for high-risk servers like Stripe, GitHub-write, AWS, or anything that moves money.

Cursor config lives in .cursor/hooks.json (project) or ~/.cursor/hooks.json (global):

{
"hooks": {
"beforeShellExecution": [
{
"matcher": "rm -rf|DROP TABLE|kubectl delete",
"command": "~/.cursor/hooks/deny-dangerous.sh",
"failClosed": true
}
],
"beforeMCPExecution": [
{
"matcher": "stripe.*",
"command": "~/.cursor/hooks/require-approval.sh",
"failClosed": true
}
],
"stop": [
{
"command": "~/.cursor/hooks/auto-pr-watch.sh"
}
]
}
}

As of January 2026 Cursor moved hook execution to an in-process runner — 10–20× faster than the previous shell-spawn model — so you can finally afford hooks that do real work (typecheck, lint, format) without making every tool call feel laggy.

Codex CLI’s hook story is the youngest of the three. It still supports the legacy notify config (the original “desktop toast when the agent finishes” feature) — but in early 2026 OpenAI shipped a proper hooks.json system covering five events: SessionStart, UserPromptSubmit, PreToolUse, PostToolUse, Stop. The shape mirrors Claude Code, lives in ~/.codex/hooks.json or .codex/hooks.json, and Codex passes JSON on stdin / reads JSON from stdout. The legacy notify array in config.toml still works for the narrow “fire-and-forget on completion” case:

~/.codex/config.toml
notify = ["bash", "-lc", "terminal-notifier -title 'Codex' -message 'Task complete'"]

If you’re starting fresh in 2026, skip notify and write hooks.json directly. If you’re already using notify, leave it alone but add the new system on top — they coexist.

The “5+ orchestrated” bar is easy to hit if you start from the five hooks that earn their keep on day one:

1. Auto-format + typecheck after every edit. A PostToolUse matcher on Edit|Write that runs your project’s formatter (Prettier/Biome/Ruff) and typechecker, feeding errors back to Claude. The agent forgets to re-run the formatter once per ~5 edits; the hook removes the category.

.claude/hooks/format-and-typecheck.sh
#!/usr/bin/env bash
set -e
cd "$CLAUDE_PROJECT_DIR"
npm run format -- --write 2>&1 | tail -20 >&2
npm run type-check 2>&1 | tail -20 >&2

2. Deny dangerous shell commands. A PreToolUse matcher on Bash that scans the command for rm -rf /, DROP TABLE, kubectl delete -A, gh repo delete, etc., and exits 2 if it sees one.

.claude/hooks/deny-dangerous.sh
#!/usr/bin/env bash
cmd=$(jq -r '.tool_input.command' <<< "$(cat)")
if echo "$cmd" | grep -qE 'rm -rf (/|~|\\$HOME)|DROP TABLE|kubectl delete -A|gh repo delete'; then
echo "blocked: dangerous command pattern: $cmd" >&2
exit 2
fi
exit 0

3. Auto-PR-watch on Stop. A Stop hook that checks whether the working tree has commits ahead of origin/<default> and, if so, pushes the branch and opens a PR via gh. The single highest-leverage hook in any agent setup — converts “code written” into “code shipped” without you having to remember.

4. Fix-on-review on Stop. Same Stop hook, second branch: if an open PR for this branch has new comments since the last signature you handled, wake the agent (via asyncRewake or re-prompt) to address them. This is the orchestration step — the hook reads state it wrote on a previous run.

5. MCP audit / approval gate. Either a Claude Code PreToolUse matcher on mcp__* that logs every MCP call, or a Cursor beforeMCPExecution hook requiring explicit approval for high-risk MCP servers. Even pure logging pays for itself the first time you debug “why did the agent call this MCP tool 80 times?”

That’s five hooks. With auto-PR-watch and fix-on-review sharing state in ~/.claude/auto-pr-state/<hash>.json, you’ve also hit “orchestrated.” Add a SessionStart hook and a beforeMCPExecution gate and you’re past the bar.

  1. Pick the two hooks that earn their keep on day one. Auto-format-on-edit and deny-dangerous-bash. Uncontroversial, fail-safe, demonstrate value before you invest in orchestration. Plain shell scripts in .claude/hooks/.

  2. Wire them up in settings.json. Project-level (.claude/settings.json, checked in) for hooks the whole team runs; user-level (~/.claude/settings.json) for personal hooks like Slack notifications. Cursor mirrors this with .cursor/hooks.json and ~/.cursor/hooks.json.

  3. Verify the hooks fire. In Claude Code, run /hooks to see the active config. Trigger each event manually: edit a file (PostToolUse), run a benign bash command (PreToolUse passes), run rm -rf /tmp/test-bad (PreToolUse blocks). Read the stderr — that’s how you debug them.

  4. Add the third hook: auto-PR-watch on Stop. Resolve default branch with gh repo view --json defaultBranchRef --jq .defaultBranchRef.name — never hardcode main or master. Hook checks for unpushed commits, creates a feature branch if you’re on default, pushes, and gh pr create. Store a state file (~/.claude/auto-pr-state/<repo-hash>.json) with the last branch/PR/SHA you handled.

  5. Add the fourth hook: fix-on-review, reading the state file. Same Stop hook, second branch: if a PR exists, poll gh pr view <PR#> --comments, the inline review comments API, and gh pr checks <PR#>. If anything is newer than the state signature, exit with a message that wakes Claude.

  6. Add the fifth hook: MCP audit or approval gate. Claude Code: PreToolUse matcher on mcp__* logging to a file or POSTing to your observability backend. Cursor: beforeMCPExecution gating high-risk servers (stripe.*, aws.*) behind interactive approval, with failClosed: true.

  7. Test the orchestration explicitly. Run a full feature end-to-end: code, commit, Stop hook opens PR, bot reviews land, Stop hook wakes the agent to address them, merge. The loop should run with you watching, not driving. Fix the bugs — orchestrated hooks fail in surprising ways the first few times.

  8. Add a SessionStart hook for context injection. Cheap, useful, pads your count: log session metadata, branch, git status to stderr so it lands in your transcript. Two lines of shell, immediate payoff for forensics later.

  9. Document the stack in CLAUDE.md. Tell future-you what each hook does, where it lives, how to disable it. Undocumented hooks get blamed when they fire in surprising ways.

  10. Review quarterly. Hooks rot. Formatters change flags, MCP servers get deprecated, gh ships breaking changes. Re-run each hook with known-good input and confirm exit codes match expectations.

  • No error handling. A hook that crashes silently is a ghost — Claude proceeds (or not) and you have no idea why. Every hook should set set -euo pipefail and print a one-line summary to stderr on both success and failure.
  • Expensive hooks on hot paths. A PreToolUse hook on Bash that takes 4 seconds is a 4-second tax on every shell command. Profile your hooks; anything over ~200ms on a hot event needs to move to a less frequent event or get cached.
  • Blocking hooks with no timeout. A hook hanging on network/approval blocks the whole agent loop. Set a timeout (Claude Code’s HTTP hooks have timeout; for command hooks, wrap with timeout 10s). Decide timeout = block or proceed per hook.
  • Infinite loops between hooks. A Stop hook that re-prompts and triggers another Stop is the classic foot-gun. Gate re-prompts on a state signature (branch, head_sha, max_comment_id); re-fire only when it changed.
  • Hooks that leak secrets. Hooks see full tool inputs/outputs on stdin — including secrets. Never echo the payload to a public log. Redact known patterns (sk-…, ghp_…) first.
  • Project hooks that don’t degrade gracefully. If your team checks in .claude/settings.json with a hook calling pnpm, anyone without pnpm has a broken setup. Detect missing deps and skip-with-warning.
  • Hooks that fight the agent instead of helping. A hook that exits 2 on any edit to src/ is just turning the agent off. Hooks should encode rules that are almost always right, not your feeling that the agent shouldn’t be trusted today.
  • You can list, from memory, at least 5 hooks active in your setup right now, and which event each fires on.
  • .claude/settings.json (or ~/.claude/settings.json) has explicit entries for at least 3 distinct events — not 5 hooks all on PostToolUse.
  • At least one hook reads state another hook writes (auto-PR-watch + fix-on-review, or session-start logger + stop reporter).
  • You have at least one hook touching MCP — either logging mcp__* tool calls in Claude Code, or a Cursor beforeMCPExecution gate.
  • Your auto-PR or auto-review hook has actually opened a PR or addressed a bot comment without you driving it in the last 7 days.
  • Every hook has stderr output you can read after it fires, and an exit code policy you can describe.
  • Your hooks are documented in CLAUDE.md (what they do, where they live, how to disable).
  • You can disable any single hook in under 10 seconds (commenting out a settings.json block), and you’ve actually done so once to debug.