CI/CD Pipeline Patterns

Your deploy passed every check and still took prod down. The unit tests were green, the Docker build succeeded, the kubectl set image returned 0 - but the image shipped a CRITICAL CVE in a transitive dependency, because nothing in the pipeline ever scanned it. CI told you everything was fine right up until pager duty told you it wasn’t.

The fix isn’t “write a better YAML by hand.” It’s using your AI tool to generate a pipeline that already has the gates a senior platform engineer would add - dependency caching, a vulnerability scan that actually fails the build, and a production deploy that waits on a real health check - then reviewing what it produced instead of authoring 200 lines of boilerplate from scratch.

What You’ll Walk Away With

A copy-paste prompt that turns an existing package.json + Dockerfile into a complete GitHub Actions workflow with matrix Node versions, dependency caching, and a Trivy scan that gates the build.
A prompt that converts a red CI log into a root-cause diagnosis plus a concrete fix, so you stop bisecting failed runs by hand.
A prompt for adding a metric-gated canary rollout to an existing kubectl deploy.
The three-tool workflow (Cursor, Claude Code, Codex) for generating, reviewing, and running pipeline changes - including the official GitHub Actions for Claude Code and Codex so the same model that wrote the pipeline can review your PRs.
A “When This Breaks” map of the failure modes generated pipelines hit first: cache misses, OIDC role-assume errors, rollout timeouts, and scanner false positives.

The Workflow: Generate a Pipeline From Your Repo

The trap with “create a CI/CD pipeline” prompts is that they produce a plausible-looking workflow disconnected from your actual project - wrong package manager, invented test scripts, a Node version you don’t run. Anchor the prompt to the files already in the repo so the output matches reality.

Open the repo in Cursor, add package.json, Dockerfile, and any existing .github/workflows/* to the Agent context with @, then run the prompt below in Agent mode. Cursor edits the workflow file in place and you review the diff in the editor before accepting. Keep package.json in context so it reads your real scripts block instead of guessing npm test.

From the repo root, Claude Code already has filesystem access, so it reads package.json and Dockerfile itself - you don’t paste them. Run claude and give it the prompt. Add the GitHub MCP server first if you want Claude to read existing workflow runs and Actions secrets metadata while it works:

claude mcp add --transport http github https://api.githubcopilot.com/mcp/

The GitHub MCP server is the same HTTP endpoint across Cursor, Claude Code, and Codex - only the registration command differs. The deprecated /sse transport is gone; use --transport http.

From the repo root, run codex and give it the prompt - Codex reads package.json and Dockerfile from the working tree. Keep the local change reviewable with workspace-scoped writes:

codex --sandbox workspace-write -c approval_policy=on-request "<the prompt below>"

The sandbox confines writes to the workspace. The approval policy lets Codex request broader access, but does not require approval for each in-sandbox edit, so always review the generated workflow diff. Do not run untrusted PR code while generating it.

Copy-paste prompt - generate a hardened GitHub Actions workflow from the repo:

Read package.json and Dockerfile in this repo. Generate
.github/workflows/ci.yml for a Node.js service with:

- A test job: matrix over Node 20 and 22, npm dependency caching via
  actions/setup-node cache:'npm', running the exact scripts in
  package.json (do not invent test commands).
- A build job (needs: test): build and push a multi-arch image to GHCR
  with docker/build-push-action, GHA layer cache, and OIDC login (no
  long-lived registry password).
- A scan step in the build job: aquasecurity/trivy-action pinned to a
  release tag, severity CRITICAL,HIGH, exit-code 1 so the build FAILS
  on a vulnerable image - not just uploads SARIF.
- A deploy-prod job gated on a GitHub Environment with required
  reviewers, triggered only on published releases.

Pin every third-party action to a released major tag, not @master.
Use actions/checkout@v6, actions/setup-node@v6, actions/cache@v5.
Output only the YAML and a one-paragraph explanation of the gates.

A good generated result looks like this skeleton - note the scan step that actually gates the build, which is the part hand-written pipelines forget:

name: CI
on:
  push: { branches: [main] }
  pull_request:
  release: { types: [published] }

permissions:
  contents: read

jobs:
  test:
    runs-on: ubuntu-latest
    strategy:
      matrix: { node: [20, 22] }
    steps:
      - uses: actions/checkout@v6
      - uses: actions/setup-node@v6
        with:
          node-version: ${{ matrix.node }}
          cache: 'npm'
      - run: npm ci
      - run: npm test         # the real script, read from package.json

  build:
    needs: test
    runs-on: ubuntu-latest
    permissions:
      contents: read
      packages: write
      id-token: write          # OIDC, no stored registry password
    steps:
      - uses: actions/checkout@v6
      - uses: docker/setup-buildx-action@v4
      - uses: docker/login-action@v4
        with:
          registry: ghcr.io
          username: ${{ github.actor }}
          password: ${{ secrets.GITHUB_TOKEN }}
      - uses: docker/build-push-action@v7
        id: build
        with:
          push: true
          tags: ghcr.io/${{ github.repository }}:${{ github.sha }}
          cache-from: type=gha
          cache-to: type=gha,mode=max
      - name: Scan image (fails build on CRITICAL/HIGH)
        uses: aquasecurity/trivy-action@v0.36.0
        with:
          image-ref: ghcr.io/${{ github.repository }}:${{ github.sha }}
          severity: 'CRITICAL,HIGH'
          exit-code: '1'
          ignore-unfixed: true

  deploy-prod:
    needs: build
    if: github.event_name == 'release'
    runs-on: ubuntu-latest
    environment: production    # required reviewers enforce the gate
    steps:
      - uses: actions/checkout@v6
      # ... cloud auth + rollout below

The reviewable decision here is exit-code: '1' on the Trivy step. Without it, the scan produces a report nobody reads and the build stays green. With it, a vulnerable base image blocks the deploy. That single line is the difference between the broken pipeline in the hook and a safe one.

Diagnosing a Red Run

Generating a pipeline is the easy half. The half that eats your afternoon is a run that fails on line 1 of a 400-line log with npm ci exiting 1 and no obvious reason. Paste the log at your AI tool instead of scrolling it.

Copy-paste prompt - turn a failed CI log into a root cause and a fix:

This GitHub Actions run failed. Here is the failing step's full log:

<paste the raw log from the failed step>

And here is the relevant job from .github/workflows/ci.yml:

<paste the job YAML>

Tell me the single root cause (not a list of possibilities), why it
happened in CI but passes locally, and the exact YAML or lockfile
change that fixes it. If it's a cache-key or lockfile-drift issue,
say so explicitly and show the corrected key.

The “passes locally but fails in CI” framing matters: it pushes the model toward environment-specific causes - a stale actions/cache key restoring node_modules built against the wrong Node version, a lockfile committed out of sync with package.json, a missing service container - rather than generic “check your tests” advice.

Adding a Metric-Gated Canary

Once the basic deploy works, the next request is usually “ship to 10% of traffic, watch error rate, promote or roll back automatically.” This is where a generated kubectl blob is dangerous if it’s not gated on a real signal - so make the gate explicit in the prompt.

Copy-paste prompt - add a metric-gated canary to an existing deploy:

My deploy job currently runs:
  kubectl set image deployment/app app=$IMAGE -n production
  kubectl rollout status deployment/app -n production

Rewrite it as a canary rollout:
1. Deploy a separate app-canary deployment with 1 replica at the new image.
2. Route 10% of traffic to it (we use an Istio VirtualService - generate it).
3. Query our Prometheus for the canary's 5xx rate over 5 minutes via a
   shell step hitting the Prometheus HTTP API; promote only if the rate
   is below 1%, otherwise delete the canary and exit 1.
Show the promotion/rollback as plain shell with explicit thresholds I can
edit - do not hide the threshold inside a black-box action.

Insist on “plain shell with explicit thresholds.” A canary that promotes on an opaque action input is unauditable; one where the 0.01 error-rate threshold is visible in the workflow is something your team can reason about and tune.

GitLab CI and Jenkins

The same generate-then-review loop applies to other platforms - only the target file changes. The prompts above work verbatim if you swap “GitHub Actions / .github/workflows/ci.yml” for “.gitlab-ci.yml stages” or “a declarative Jenkinsfile.” Two platform-specific notes worth giving the AI:

GitLab CI: ask for built-in security templates (include: - template: Security/SAST.gitlab-ci.yml and Container-Scanning.gitlab-ci.yml) rather than hand-rolled scanner jobs - they’re maintained by GitLab and wire results into the MR widget. Pin job images to a real tag (node:22-alpine), and use cache:key:files: [package-lock.json] so the cache invalidates on lockfile change.
Jenkins: request a Jenkinsfile using parallel quality-gate stages and Docker agents, and have it call withSonarQubeEnv + waitForQualityGate abortPipeline: true so a failing quality gate actually stops the pipeline. Keep deploy logic in a shared library, not inline.

Closing the Loop: AI That Reviews Its Own Pipelines

The strongest pattern is letting the same model that wrote the pipeline review the PR that changes it. Both vendors ship a first-party GitHub Action, so you don’t shell out to npm install -g and scrape git diff by hand.

Cursor doesn’t ship a CI action - it’s an IDE. The division of labor: use Cursor’s Agent and background agent to author and iterate on the workflow locally with full editor context, then let Claude Code’s or Codex’s GitHub Action handle the in-CI review step. Generate locally, review in CI.

Run /install-github-app inside claude to set up the app and ANTHROPIC_API_KEY secret, then drop in the official action. It detects whether to respond to @claude mentions or run a prompt automatically:

name: Claude Review
on:
  pull_request:
    types: [opened, synchronize]
jobs:
  review:
    runs-on: ubuntu-latest
    permissions:
      contents: read
      pull-requests: write
    steps:
      - uses: actions/checkout@v6
      - uses: anthropics/claude-code-action@v1
        with:
          anthropic_api_key: ${{ secrets.ANTHROPIC_API_KEY }}
          prompt: "/review"
          claude_args: "--max-turns 5"

Pin to a current release of anthropics/claude-code-action@v1 (a January 2026 advisory was patched in v1.0.94+); review the action’s security docs before granting pull-requests: write on PRs from forks.

Codex ships openai/codex-action@v1, which installs the CLI and runs codex exec under a sandbox you control. Store prompts as files in .github/codex/prompts/:

name: Codex Review
on:
  pull_request:
    types: [opened, synchronize, reopened]
jobs:
  codex:
    runs-on: ubuntu-latest
    permissions:
      contents: read
      pull-requests: write
    steps:
      - uses: actions/checkout@v6
      - uses: openai/codex-action@v1
        with:
          openai-api-key: ${{ secrets.OPENAI_API_KEY }}
          prompt-file: .github/codex/prompts/review.md
          output-file: codex-output.md
          safety-strategy: drop-sudo
          sandbox: workspace-write

safety-strategy: drop-sudo (the default) removes sudo before Codex runs and is irreversible for the job - keep it on for any PR-triggered workflow. For an API-key Codex run, pin the intended GPT-5.6 tier through the model input instead of assuming the same product default across plans and surfaces.

When This Breaks

Cache restores but the build still recompiles everything. The actions/cache key didn’t change but the contents are wrong - usually node_modules cached against a different Node version, or a key missing the lockfile hash. Use a key like ${{ runner.os }}-node${{ matrix.node }}-${{ hashFiles('**/package-lock.json') }} and prefer actions/setup-node’s built-in cache: 'npm' (which caches the npm download cache, not node_modules) over caching node_modules directly.
OIDC deploy fails with “Not authorized to perform sts:AssumeRoleWithWebIdentity”. The job has permissions: id-token: write but the cloud-side trust policy doesn’t allow this repo/branch. The fix is on the IAM role’s trust relationship (sub condition must match repo:org/name:ref:refs/heads/main or your environment), not in the YAML. Ask your AI to generate the trust policy from the workflow’s permissions and trigger, then verify the sub claim matches.
kubectl rollout status hangs until the job times out. New pods aren’t going Ready - failing readiness probe, image pull error, or insufficient cluster resources. Add --timeout=120s to fail fast, then kubectl describe deployment/app -n <ns> and kubectl get events --sort-by=.lastTimestamp to see why. A rollout that never completes is a stuck deploy, not a slow one.
Trivy fails the build on a CVE you can’t fix. A CRITICAL in a transitive dependency with no patched version yet. Don’t delete the scan step - that’s how the hook happened. Use ignore-unfixed: true (already in the skeleton) to skip CVEs with no fix, and a reviewed .trivyignore with the specific CVE ID and an expiry note for the rest. Blanket-disabling the gate is the wrong fix.
The generated workflow pins actions to @master or @main. This is the single most common defect in AI-generated pipelines: it’s a supply-chain risk and non-reproducible. Pin every third-party action to a released tag (or a commit SHA for high-trust paths). Re-prompt with “pin all actions to released tags, no floating refs” or fix it on review.

What’s Next

Containerization Generate and harden the Dockerfile your pipeline builds and scans - multi-stage builds, distroless bases, and slim images.

Deployment Operations The full deploy lifecycle: rollout strategies, monitoring, incident response, and cost optimization with AI assistance.

Security Testing Go deeper on SAST, dependency scanning, and secrets detection - the gates that should run before your pipeline ever builds an image.

Kubernetes Patterns Generate the manifests and rollout logic your deploy job applies, including canary and blue-green strategies.