Continuous Delivery Best Practices

Your PR has sat for two days waiting on a reviewer. The deploy needs three manual approvals across two Slack channels, the release notes are still a TODO, and the one time CI went red overnight nobody triaged it until standup. Continuous Delivery promises to shrink that gap to minutes, but the glue work, reviews, YAML, gates, changelog, failure triage, is exactly the tedium nobody wants to own.

That glue work is where an AI assistant earns its keep. Not “AI writes your app,” but AI as a tireless reviewer, YAML generator, and first responder wired directly into your pipeline.

What you’ll walk away with

A real GitHub Actions step that runs Claude Code headless on every PR diff and comments inline
Copy-paste prompts to generate pipeline YAML, gate a deploy, and draft release notes, one per tool
The Cursor / Claude Code / Codex split for where each tool fits in CD
A “when this breaks” checklist for the failure modes AI-generated pipelines hit in production

Where AI fits in the pipeline

The highest-leverage place to start is automated PR review, it is low-risk (comments only, no deploys) and pays off on day one. From there you move outward: generating the workflow files, gating the deploy, and triaging red builds.

The three tools occupy different surfaces of the pipeline. Pick based on where your team already lives.

Cursor’s BugBot reviews PRs automatically once enabled on the repo and posts inline comments on likely bugs. Re-trigger a review on demand by commenting bugbot run on the PR. When it flags something, Autofix (GA since February 2026) can spawn a background Cloud Agent that opens a follow-up PR with the proposed fix, so a reviewer approves a diff instead of writing one. As of May 2026 BugBot bills per review (roughly $1.20 for a default-effort pass, more for large diffs) on Teams and Individual plans instead of the old flat per-seat fee.

Use Cursor when your team reviews in the GitHub UI and wants fixes proposed as PRs they can eyeball.

Claude Code shines in headless CI. Run claude -p inside a GitHub Action to review a diff, gate a deploy, or draft a changelog, scripted, no TUI. Pair it with a PreToolUse hook locally so a risky command (a raw kubectl apply, a force-push) pauses for confirmation before the agent runs it.

Use Claude Code when CD lives in your .github/workflows and you want the agent invoked from a script with explicit allowed tools.

Run a reviewer on every PR

Here is a real, minimal GitHub Actions step that runs Claude Code headless against the PR diff and writes inline review comments. The flags are the load-bearing part, --allowedTools (not --allow-tools) restricts what the agent may touch, and --output-format json (not --json) makes the result parseable downstream.

name: AI PR Review
on: pull_request
permissions:
  contents: read
  pull-requests: write
jobs:
  review:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v6
        with:
          fetch-depth: 0
      - name: Run Claude Code review
        env:
          ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
        run: |
          git diff origin/${{ github.base_ref }}...HEAD > /tmp/pr.diff
          npx -y @anthropic-ai/claude-code -p \
            "Review the diff in /tmp/pr.diff for security issues, logic bugs, and missing error handling. Be specific and cite file:line. Skip style nits." \
            --allowedTools "Read,Grep,Bash(git diff:*)" \
            --output-format json > review.json

The point is the inversion: you don’t paste a diff into a chat window, the pipeline feeds the diff to the agent and captures structured output you can post as a comment or fail the job on.

Generate the pipeline, don’t hand-write it

Nobody should write CI YAML from a blank file. Describe the pipeline in plain English and let the AI emit it, then review the result against your runner and secret names.

In Cursor’s agent mode, open the repo and prompt the agent to create the workflow file. Because it can read your package.json and existing .github/workflows, it will match your real scripts and Node version instead of guessing.

From the terminal, let Claude Code read the project and write the file in one shot, then diff it before committing.

Run Codex with workspace-write so it can create the file, and configure on-request separately so it can ask before crossing that sandbox boundary. Routine in-sandbox edits and commands do not each trigger a prompt.

Gate the deploy with a human in the loop

Full auto-deploy is the last thing to adopt, not the first. Start with the AI preparing the deploy and running pre-flight checks, then handing off to a human for the final yes. The approval can live in Slack, in a GitHub environment protection rule, or in a chat with the agent itself.

For the release-notes step, give the AI the commit range and a format, not a vague “summarize.”

When this breaks

AI-assisted CD fails in specific, recognizable ways. Watch for these:

Hallucinated actions and secrets. Generated YAML references actions/checkout@v3 (outdated), a marketplace action that doesn’t exist, or secrets.DEPLOY_KEY you never set. Always diff the generated workflow and run it once on a throwaway branch before trusting it. Pin to a current major (@v6 for checkout) and grep the file for every secrets.* it invented.
Green-but-broken auto-fixes. An Autofix or agent PR makes the tests pass by weakening the test, not fixing the bug, or by catching and swallowing the error. Treat AI-authored fix PRs as PRs: read the diff, don’t rubber-stamp a green check.
Over-permissive tokens. The agent grants permissions: write-all “to be safe.” Scope permissions: to the minimum each job needs (contents: read, pull-requests: write).
Review loops that never converge. The bot comments, the author “addresses” it, the bot re-flags the same thing. Cap re-reviews and escalate to a human after the second round instead of looping.
A reviewer that blocks the merge queue. If the AI review job is a required check and the API is down, nothing merges. Make the review job non-blocking, or add a timeout and a manual override.

What’s next

CI/CD Pipelines — deeper patterns for building the pipeline itself
Incident Response — when a deploy goes wrong and you need the AI on triage
Test-Driven Development — the tests that make the gate above meaningful