Pipeline Automation with AI

Your monorepo CI runs every job on every push. A one-line README edit triggers a 38-minute build of all six services, the test job goes red 30% of the time on the same three flaky specs, and last Friday’s deploy to Cloudflare Workers shipped a regression that nobody caught until support tickets rolled in. You don’t need “AI-powered self-healing pipelines” — you need the build to only run what changed, the flaky tests quarantined, and a deploy that rolls itself back when error rates spike.

This article shows the concrete workflow for getting an AI agent to write those pipeline changes for you, across all three tools, with the real GitHub Actions YAML they produce.

What You’ll Walk Away With

A dorny/paths-filter change-detection job that skips unaffected services (and the prompt that generates it)
A flaky-test triage workflow: feed the agent a failing run, get back a quarantine list and a retry policy
A progressive Cloudflare Workers deploy with wrangler versions + a gradual rollout and automatic rollback
Copy-paste prompts for each, tuned for Cursor, Claude Code, and Codex
The failure modes that bite when you let an agent edit CI, and how to recover

The Workflow: Change-Aware Builds

The single highest-leverage CI change in a monorepo is to stop building everything. dorny/paths-filter reads the diff and sets per-path outputs you gate jobs on. Ask the agent to write the filter against your actual directory layout, not a template.

The agent should produce something like this — a real, runnable job, not a description of one:

name: CI
on:
  pull_request:
  push:
    branches: [main]

jobs:
  changes:
    runs-on: ubuntu-latest
    outputs:
      api: ${{ steps.filter.outputs.api }}
      web: ${{ steps.filter.outputs.web }}
    steps:
      - uses: actions/checkout@v5
      - uses: dorny/paths-filter@v3
        id: filter
        with:
          filters: |
            api:
              - 'services/api/**'
              - 'packages/shared/**'
            web:
              - 'services/web/**'
              - 'packages/shared/**'

  test-api:
    needs: changes
    if: ${{ needs.changes.outputs.api == 'true' }}
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v5
      - run: npm ci && npm run test --workspace=services/api

Note packages/shared/** appears under both filters: a change to shared code correctly rebuilds both consumers. That cross-dependency mapping is exactly what you want the agent to infer from your repo rather than hand-maintaining.

The interaction differs by tool. Cursor edits the YAML inline in the editor; Claude Code runs headless in the terminal and can verify with gh; Codex runs the same prompt as a non-interactive exec job or as a Cloud task off a GitHub issue.

In Agent mode, open the repo and point it at the real layout. Cursor reads package.json workspaces and the services/ tree, then writes .github/workflows/ci.yml as an inline diff you review before accepting:

Add a changes job using dorny/paths-filter@v3 to .github/workflows/ci.yml. Derive one filter per package in my workspaces array, and make every filter also match packages/shared/**. Gate each existing test job behind its matching needs.changes.outputs.* value. Use actions/checkout@v5.

Use a checkpoint before accepting so you can revert the whole edit in one click if the gating is wrong.

Run it headless from the repo root so Claude Code can inspect the tree and validate the result with the GitHub CLI:

claude "Read package.json workspaces and the services/ directory, then rewrite \
.github/workflows/ci.yml to add a dorny/paths-filter@v3 'changes' job. One filter \
per workspace, each also matching packages/shared/**, and gate every test job behind \
its needs.changes.outputs.* flag."

# Validate the YAML parses and lists the new job before you push
gh workflow view ci.yml

For repeatable use, wrap the prompt in a custom slash command (.claude/commands/affected.md) and rerun it whenever you add a service.

Codex runs the same task non-interactively. Locally, scope it to the workspace so it can edit files but not touch your shell:

codex exec --sandbox workspace-write \
  "Rewrite .github/workflows/ci.yml: add a dorny/paths-filter@v3 changes job with one \
filter per npm workspace (each also matching packages/shared/**), and gate each test \
job on its needs.changes.outputs.* flag."

To run it as a Cloud task off a GitHub issue, submit to your Codex environment with codex cloud exec --env <ENV_ID> "..." (list environments with codex cloud). Codex opens a branch and a PR with the edited workflow.

The Workflow: Flaky-Test Triage

Flaky tests don’t need an AI “stability analyzer” — they need someone to look at the last N runs, find the specs that fail nondeterministically, quarantine them, and open a ticket. That “someone” can be the agent, because it can read a failing run’s logs directly through the GitHub MCP server.

The GitHub MCP server is a remote Streamable HTTP endpoint. Install it once — the config is identical across all three tools since MCP is a shared standard:

claude mcp add --transport http github https://api.githubcopilot.com/mcp/

With it connected, the agent can pull the failed job’s logs instead of you copy-pasting them. The before/after is the whole point: without the MCP server you paste a log dump; with it, you say “the last run of CI on this PR” and the agent fetches the annotations, the failing spec names, and the surrounding output itself.

In Agent mode with the GitHub MCP server enabled, reference the failing run and let Cursor pull the logs, then propose a quarantine diff for your test config:

The last CI run on this branch failed. Pull the failing job’s logs via the GitHub MCP server, list the specs that fail intermittently (passing on retry), and mark them with test.skip plus a // FLAKY: <run-url> comment. Open a checklist of what you skipped.

Claude Code can drive the GitHub CLI directly, so you don’t even strictly need the MCP server here — pipe the failed log straight in:

gh run view --log-failed | claude "These are the failing jobs from our latest CI run. \
Identify which Vitest specs are flaky (timeouts, race conditions, order-dependence) \
versus genuinely broken. For the flaky ones, output a vitest.config.ts 'retry: 2' \
setting and a list of describe blocks to move into a @flaky tag we exclude from \
required checks."

Run the triage as a non-interactive job that reads the same log file and edits the config under sandbox:

gh run view --log-failed > failed.log
codex exec --sandbox workspace-write \
  "Read failed.log. Classify each failing Playwright test as flaky or real. For flaky \
ones, add 'test.fixme' with a link comment and add { retries: 2 } to playwright.config.ts \
for that project. Summarize the real failures separately so I can fix them by hand."

The Workflow: Progressive Deploy with Auto-Rollback

A binary deploy either fully succeeds or fully fails. A progressive deploy ships the new version to a slice of traffic, watches error rates, and rolls back automatically if they spike. On Cloudflare Workers this is built into wrangler versions — no service mesh required.

Have the agent write a deploy job that uploads a new version, routes 10% of traffic to it, and gates the full rollout on a health check:

# .github/workflows/deploy.yml (excerpt)
- name: Upload new version
  run: npx wrangler versions upload --json > version.json

- name: Canary 10% of traffic
  run: |
    VID=$(jq -r '.id' version.json)
    npx wrangler versions deploy "$VID@10%" "${PREV}@90%" --yes

- name: Health gate
  run: ./scripts/check-error-rate.sh   # exits non-zero if 5xx rate > threshold

- name: Promote to 100%
  if: success()
  run: npx wrangler versions deploy "$(jq -r .id version.json)@100%" --yes

- name: Rollback on failure
  if: failure()
  run: npx wrangler rollback --message "Auto-rollback: health gate failed"

Write .github/workflows/deploy.yml for a Cloudflare Worker. Use wrangler versions upload, then wrangler versions deploy to send 10% of traffic to the new version, run scripts/check-error-rate.sh as a gate, promote to 100% on success, and wrangler rollback on failure. Pin actions/checkout@v5.

claude "Generate .github/workflows/deploy.yml that does a gradual Cloudflare Workers \
rollout: wrangler versions upload, deploy at 10% canary, run scripts/check-error-rate.sh \
as a health gate, promote to 100% on pass, wrangler rollback on fail. Also stub \
check-error-rate.sh to query the Workers analytics 5xx rate and exit 1 above 1%."

Cloudflare’s wrangler agent skill and the wrangler MCP context help here — but the prompt above already produces a runnable workflow.

For deploys, run Codex in CI itself with the official action rather than locally. openai/codex-action@v1 installs the CLI and runs codex exec inside the job:

- uses: actions/checkout@v5
- name: Generate deploy workflow
  uses: openai/codex-action@v1
  with:
    openai-api-key: ${{ secrets.OPENAI_API_KEY }}
    prompt: 'Write a progressive Cloudflare Workers deploy using wrangler versions with a 10% canary, a health gate, and wrangler rollback on failure.'
    sandbox: workspace-write
    safety-strategy: drop-sudo

Picking the Model

For agent-driven CI edits, cross-file reasoning matters more than raw speed. When budget matters less than a correct first pass, use Claude Fable 5; use Claude Opus 5 for premium planning and Claude Sonnet 5 for mechanical triage. ChatGPT Codex offers GPT-5.6 Sol, Terra, and Luna by plan and task; use gpt-5.6-sol when you explicitly select Sol for an API-key-authenticated GitHub Action. See model comparison for pricing and the full capability ladder.

When This Breaks

CI is the one place where a confidently wrong agent edit costs you a broken main. The failure modes are specific.

If an agent-generated workflow fails to parse, don’t ask the agent to “fix the YAML” blind — paste the exact gh workflow view or Actions error back in. Pinning every action to a major version (@v5, @v3) also prevents the classic “it worked yesterday” break when a floating tag ships a breaking change.

What’s Next

Container Orchestration Extend these pipelines to Docker and Kubernetes deployments with AI-assisted manifests and scaling.

Infrastructure as Code Generate and review Terraform and Wrangler config with the same change-aware, agent-driven workflow.

Performance Monitoring Wire the health gates above into real monitoring so progressive deploys roll back on the right signals.

Security Operations Add SAST, dependency, and secret scanning to these pipelines without slowing the inner loop.