Automation Workflows: AI-Powered Development Pipelines

Your nightly dependency-bump job has been red for three days, the “AI code review” bot your team bolted on posts generic praise on every PR, and the only person who understands the release script just went on leave. Interactive AI coding is great inside the editor — but the moment you try to script it, you hit a wall of fabricated CLI flags and brittle glue code.

Cursor ships a real headless agent (agent -p) built exactly for this. This guide shows how to wire it into git hooks, CI/CD, and scheduled jobs — with runnable scripts, not pseudo-code — and where Claude Code and Codex do the same job from the terminal.

What You’ll Walk Away With

A pre-commit hook that runs a real AI review on staged changes and blocks the commit on findings
A GitHub Actions job that posts a structured PR review using agent -p --output-format json
A scheduled-maintenance script that proposes dependency bumps and opens a PR
The same three workflows shown for Cursor, Claude Code, and Codex so you can pick per repo
A “When This Breaks” checklist for runaway agents, cost ceilings, and MCP auth failures

The Headless Agent in 60 Seconds

Everything below is built on one command. -p (print mode) runs non-interactively; without --force the agent only proposes edits, with --force it writes them:

# Ask a question, print the answer, exit (no file changes)
agent -p "What does this codebase do?"

# Apply changes directly (scripts/CI)
agent -p --force "Refactor src/auth/token.ts to use the jose library"

# Structured output for parsing
agent -p --output-format json --model gpt-5.6-sol \
  "List the 3 riskiest changes in the current diff as a JSON array"

--output-format accepts text (default), json, and stream-json. In CI, set the CURSOR_API_KEY environment variable instead of logging in interactively.

Pre-Commit AI Checks

Block commits when the staged diff has obvious problems. The hook pipes the staged diff into the agent and fails on a non-zero exit. The workflows differ by tool, so here are all three:

#!/usr/bin/env bash
# .git/hooks/pre-commit  (chmod +x)
set -euo pipefail

DIFF=$(git diff --cached --diff-filter=ACMR -U3)
[ -z "$DIFF" ] && exit 0

echo "Running Cursor pre-commit review..."
RESULT=$(printf '%s' "$DIFF" | agent -p --output-format text \
  "Review this staged diff for bugs, missing error handling, and leaked secrets.
   Reply with exactly 'PASS' on the first line if it is safe to commit,
   otherwise 'FAIL' followed by a bulleted list of blocking issues.")

echo "$RESULT"
echo "$RESULT" | head -1 | grep -q '^PASS' || { echo "Commit blocked."; exit 1; }

#!/usr/bin/env bash
# .git/hooks/pre-commit  (chmod +x)
set -euo pipefail

DIFF=$(git diff --cached --diff-filter=ACMR -U3)
[ -z "$DIFF" ] && exit 0

echo "Running Claude Code pre-commit review..."
RESULT=$(printf '%s' "$DIFF" | claude -p --max-turns 2 \
  "Review this staged diff for bugs, missing error handling, and leaked secrets.
   Reply 'PASS' on line 1 if safe to commit, else 'FAIL' and a bulleted list.")

echo "$RESULT"
echo "$RESULT" | head -1 | grep -q '^PASS' || { echo "Commit blocked."; exit 1; }

#!/usr/bin/env bash
# .git/hooks/pre-commit  (chmod +x)
set -euo pipefail

DIFF=$(git diff --cached --diff-filter=ACMR -U3)
[ -z "$DIFF" ] && exit 0

echo "Running Codex pre-commit review..."
RESULT=$(printf '%s' "$DIFF" | codex exec --sandbox read-only \
  "Review this staged diff for bugs, missing error handling, and leaked secrets.
   Reply 'PASS' on line 1 if safe to commit, else 'FAIL' and a bulleted list.")

echo "$RESULT"
echo "$RESULT" | head -1 | grep -q '^PASS' || { echo "Commit blocked."; exit 1; }

Use a prompt that forces a machine-parseable verdict so the hook can branch on it:

You are a strict pre-commit reviewer. Review ONLY the staged diff below.
Flag: real bugs, missing error handling on I/O and network calls, hardcoded
secrets or tokens, and obvious security issues (SQLi, command injection, path
traversal). Ignore style nits and naming.

Output line 1 = exactly "PASS" or "FAIL".
If FAIL, lines 2+ = one bullet per blocking issue as "file:line — problem — fix".
Do not explain anything else.

Automated Code Review in CI

Run the agent on pull requests and post the result as a comment. The vendored pattern is restricted autonomy: let the agent only modify the working directory or emit text, and keep git/PR operations as deterministic CI steps. This GitHub Actions workflow pins third-party actions to current major versions and installs the Cursor CLI directly:

name: AI Code Review

on:
  pull_request:
    types: [opened, synchronize]

permissions:
  contents: read
  pull-requests: write

jobs:
  ai-review:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
        with:
          fetch-depth: 0

      - name: Install Cursor CLI
        run: |
          curl https://cursor.com/install -fsS | bash
          echo "$HOME/.cursor/bin" >> "$GITHUB_PATH"

      - name: Generate review
        env:
          CURSOR_API_KEY: ${{ secrets.CURSOR_API_KEY }}
        run: |
          git diff "origin/${{ github.base_ref }}...HEAD" > pr.diff
          agent -p --output-format text --model gpt-5.6-sol \
            "Review the diff in @pr.diff. Focus on correctness, security, and
             missing tests. Write concise, file-scoped feedback to review.md.
             Do not modify any source files."

      - name: Post review comment
        env:
          GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
        run: gh pr comment "${{ github.event.pull_request.number }}" --body-file review.md

The same job in the other two tools:

agent -p --output-format text --model gpt-5.6-sol \
  "Review the diff in @pr.diff for correctness, security, and missing tests.
   Write file-scoped feedback to review.md. Do not modify source files."

cat pr.diff | claude -p --output-format json --max-turns 3 \
  --allowedTools "Read,Grep" \
  "Review this diff for correctness, security, and missing tests.
   Return JSON: { verdict, comments: [{ path, line, severity, message }] }"

codex exec --sandbox read-only \
  "Review the diff in pr.diff for correctness, security, and missing tests.
   Write file-scoped feedback to review.md. Do not modify source files."

A reviewer prompt that earns its place (not generic praise):

Review this pull request diff as a senior engineer who will be on call for it.
For each issue, output: severity (blocker | warning | nit), the file:line, what
breaks, and the minimal fix. Specifically check:
- error paths and timeouts on every network/DB/file call
- input validation and authz on new endpoints
- N+1 queries, unbounded loops, and missing pagination
- tests: does every new branch have coverage? name the untested branches
Skip anything that is purely stylistic. If the diff is safe, say so in one line.

Scheduled Maintenance

A weekly job that proposes dependency bumps, runs the test suite, and opens a PR. Keep the AI step scoped to file edits and let CI handle branching and PR creation:

name: Weekly Maintenance

on:
  schedule:
    - cron: '0 2 * * 0' # Sundays, 02:00 UTC

permissions:
  contents: write
  pull-requests: write

jobs:
  maintenance:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - name: Install Cursor CLI
        run: |
          curl https://cursor.com/install -fsS | bash
          echo "$HOME/.cursor/bin" >> "$GITHUB_PATH"

      - name: Propose safe dependency bumps
        env:
          CURSOR_API_KEY: ${{ secrets.CURSOR_API_KEY }}
        run: |
          agent -p --force --model gpt-5.6-sol \
            "Bump dependencies in package.json to the latest non-breaking (caret)
             versions. Update the lockfile with 'npm install'. Do NOT bump across
             major versions. Do NOT touch source files. Summarize changes to
             CHANGES.md."

      - name: Verify
        run: npm ci && npm test

      - name: Open PR
        env:
          GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
        run: |
          git checkout -B chore/weekly-deps
          git add -A && git commit -m "chore: weekly dependency bumps"
          git push -f origin chore/weekly-deps
          gh pr create --fill --title "chore: weekly dependency bumps" || true

For Claude Code, swap the agent step for claude -p --max-budget-usd 2.00 --dangerously-skip-permissions "..." so the run self-terminates if it gets expensive. For Codex in trusted unattended CI, use codex exec --sandbox workspace-write -c approval_policy=never "..." only in an isolated checkout; never expose it to untrusted PR code, and require a timeout plus diff/test review. The CI scaffolding around the AI step is identical across all three tools.

Constrain the blast radius explicitly so the bot never ships a major upgrade unattended:

Update only the dependencies in package.json that have newer patch or minor
releases. Apply caret-compatible versions, regenerate the lockfile, and run the
test suite. If any update fails the build, revert that single package and note
it. Do NOT cross a major version boundary for any package. Do NOT edit
application source code. Write a one-line-per-package summary to CHANGES.md.

Parallel Agents in Cursor 3.x

Cursor 3.0 reframed the IDE as an agent execution runtime, and 3.2–3.3 introduced three slash commands that orchestrate parallel work without leaving the chat input. Each fits a different shape of task.

`/worktree` — Isolated Edits, One Task

Spin up an isolated git worktree for the agent’s changes. Use when a task may produce edits you want quarantined from your current branch — experimental refactors, risky migrations, anything you may abandon.

/worktree refactor src/auth to use the new JWT library

Cursor creates the worktree, switches the agent into it, and surfaces a tracked location you can merge or discard at the end.

`/best-of-n` — Same Task, Multiple Models

Run the same task in parallel across multiple models (Composer 2.5, Opus 5, Fable 5, GPT-5.6 Sol), each in its own worktree, then compare outcomes. Useful when you don’t trust any single model to make a clean call — security-sensitive code, ambiguous specs, one-shot jobs you’d rather get right the first time. For tasks that demand peak intelligence, Fable 5 is the strongest option available in Cursor’s model picker; see model comparison for the full lineup.

/best-of-n 3 implement the rate-limit middleware described in docs/rate-limit-rfc.md

You pick how many runs (n); Cursor returns side-by-side diffs ranked by tests passed, lint cleanliness, and a quick semantic comparison.

`/multitask` — One Task, Multiple Subagents

Cursor breaks a task into chunks and dispatches each to an async subagent in parallel. Best for tasks that decompose naturally — cross-file refactors, parallel test fixes, multi-module updates.

/multitask migrate every import of "@old-pkg/auth" to "@new-pkg/auth" and update tests

Available in Cursor 3.2+; subagent controls (max depth, concurrency, cost ceiling) were added in 3.3.

Choose Your Parallel-Work Tool

Tool	Best for
`/worktree`	One task, edits you want quarantined
`/best-of-n`	Compare model strategies on the same task
`/multitask`	One task that naturally decomposes — async subagents in parallel
Background Agents	Cloud-hosted, long-running, runs while you’re offline

Background Agent Automation

For a repeatable cloud environment, commit .cursor/environment.json so every agent boots with your toolchain and services ready:

{
  "snapshot": "POPULATED_FROM_SETTINGS",
  "install": "npm ci",
  "start": "sudo service docker start",
  "terminals": [
    { "name": "Dev Server", "command": "npm run dev" },
    { "name": "Test Runner", "command": "npm run test:watch" }
  ]
}

A practical pattern: kick off three branches from Slack and monitor them at cursor.com/agents.

@Cursor [branch=feature/auth] implement OAuth 2.0 login: Passport.js with Google
and GitHub strategies, Redis-backed sessions, RBAC, and integration tests.

@Cursor [branch=feature/payments] add Stripe subscriptions: checkout endpoints,
webhook handlers, and error handling with retries.

@Cursor [branch=feature/notifications] build the email pipeline: Resend
integration, templated emails, a Bull queue, and an unsubscribe flow.

Keep each request narrow and verifiable. A vague “@Cursor build the whole platform” burns tokens iterating against an unclear target; three scoped tasks finish faster and review cleanly.

MCP Servers for Automation

MCP servers give the agent first-class access to your delivery tools — GitHub, GitLab, Jenkins, Terraform, Kubernetes — so a single prompt can orchestrate a release instead of you wiring up bespoke API clients. The config lives in ~/.cursor/mcp.json (Cursor), .mcp.json (Claude Code), or ~/.codex/config.toml (Codex); the server definitions are identical across all three tools.

{
  "mcpServers": {
    "github": {
      "url": "https://api.githubcopilot.com/mcp/",
      "headers": { "Authorization": "Bearer ${env:GITHUB_PAT}" }
    },
    "gitlab": {
      "type": "http",
      "url": "https://gitlab.com/api/v4/mcp",
      "headers": { "Authorization": "Bearer ${env:GITLAB_PAT}" }
    },
    "jenkins": {
      "command": "npx",
      "args": ["-y", "jenkins-mcp-server"],
      "env": {
        "JENKINS_URL": "https://jenkins.company.com",
        "JENKINS_USER": "your-username",
        "JENKINS_TOKEN": "your-api-token"
      }
    },
    "terraform": {
      "command": "npx",
      "args": ["-y", "terraform-mcp-server"]
    },
    "kubernetes": {
      "command": "npx",
      "args": ["-y", "kubernetes-mcp-server"],
      "env": { "KUBECONFIG": "~/.kube/config" }
    }
  }
}

The @modelcontextprotocol/server-github and @modelcontextprotocol/server-gitlab npm packages are deprecated (npm view reports “no longer supported”). GitHub’s official server is now github/github-mcp-server, consumed as the hosted remote endpoint https://api.githubcopilot.com/mcp/ (or a Docker image) — not an npm package. GitLab ships an official HTTP MCP server at https://<your-instance>/api/v4/mcp. Likewise, terraform-mcp does not exist — HashiCorp’s official server is terraform-mcp-server (it reads the Terraform Registry; it has no TF_WORKSPACE env var, so check its README for the env vars it actually uses), and kubernetes-mcp-server and jenkins-mcp-server are unscoped packages. Always run npm view <package> version (or confirm the official endpoint) before pasting an MCP config from any guide — stale or fabricated server references are the most common error in AI-generated docs.

With these servers connected, a release becomes one prompt:

Cut release v2.1.0:
- GitHub MCP: generate a changelog from commits since the last tag and create a draft release
- Jenkins MCP: trigger the build-and-test job and wait for it to go green
- Kubernetes MCP: roll out the new image and watch pod health
- If health checks fail, pause the rollout and report which pods are unhealthy

This is the “before/after” MCP buys you: instead of a multi-hundred-line script juggling four API clients, the agent reads the live state from each server and decides the next step. The trade-off versus an Agent Skill (a lighter, single-purpose augmentation installed with npx skills add <owner/repo>) is depth: reach for MCP when you need a persistent, stateful connection to a tool; reach for a skill when you just need to teach the agent one focused capability.

When This Breaks

Runaway agents / cost blowups. A loop that “can’t finish” will keep spending. Cap it: Claude Code --max-turns N and --max-budget-usd X; Codex with an explicit --sandbox workspace-write -c approval_policy=never policy only in trusted, isolated CI plus a job timeout; Cursor by scoping the prompt and setting a spend limit in settings. Always set a timeout-minutes on the CI job and require human review before merge.
agent -p proposed changes but nothing was written. You forgot --force. Without it, print mode only proposes edits.
cursor: command not found in CI, or a fabricated subcommand fails. The automation binary is agent (from curl https://cursor.com/install -fsS | bash), not cursor. There is no cursor lint/review/security/analyze/docs — replace any such line with an agent -p prompt.
MCP auth failures. Tokens expired or the wrong scope. Run agent mcp list to see server status and agent mcp login <server> to re-authenticate; confirm the env vars match the server’s README.
/worktree or /best-of-n leaves stray worktrees. Clean up with git worktree list then git worktree remove <path>. If a single-agent fallback ran instead of N parallel runs, check you’re on Cursor 3.2+ and re-issue with an explicit n.
CI secrets leaked through a poisoned action. Rotate the exposed credentials immediately and re-pin every third-party action to a commit SHA.

What’s Next

Parallel agents and worktrees

Go deeper on /worktree, /best-of-n, and /multitask in the Agent Modes Deep Dive.

MCP setup

Connect and debug the servers above across all three tools in Must-Have MCP Servers.

More Cursor automation

Background agents, checkpoints, and large-codebase strategies in Cursor Advanced Techniques.

CI/CD with AI

Wire AI review into your pipeline end-to-end in Pipeline Automation with AI.

Automation should amplify your judgment, not replace it. Start with a read-only check (a pre-commit reviewer that only warns), graduate to write operations behind a CI gate, and keep a human on every irreversible step — deploys, force-pushes, and anything touching production data.