Pipeline Automation with AI
Your monorepo CI runs every job on every push. A one-line README edit triggers a 38-minute build of all six services, the test job goes red 30% of the time on the same three flaky specs, and last Friday’s deploy to Cloudflare Workers shipped a regression that nobody caught until support tickets rolled in. You don’t need “AI-powered self-healing pipelines” — you need the build to only run what changed, the flaky tests quarantined, and a deploy that rolls itself back when error rates spike.
This article shows the concrete workflow for getting an AI agent to write those pipeline changes for you, across all three tools, with the real GitHub Actions YAML they produce.
What You’ll Walk Away With
Section titled “What You’ll Walk Away With”- A
dorny/paths-filterchange-detection job that skips unaffected services (and the prompt that generates it) - A flaky-test triage workflow: feed the agent a failing run, get back a quarantine list and a retry policy
- A progressive Cloudflare Workers deploy with
wrangler versions+ a gradual rollout and automatic rollback - Copy-paste prompts for each, tuned for Cursor, Claude Code, and Codex
- The failure modes that bite when you let an agent edit CI, and how to recover
The Workflow: Change-Aware Builds
Section titled “The Workflow: Change-Aware Builds”The single highest-leverage CI change in a monorepo is to stop building everything. dorny/paths-filter reads the diff and sets per-path outputs you gate jobs on. Ask the agent to write the filter against your actual directory layout, not a template.
The agent should produce something like this — a real, runnable job, not a description of one:
name: CIon: pull_request: push: branches: [main]
jobs: changes: runs-on: ubuntu-latest outputs: api: ${{ steps.filter.outputs.api }} web: ${{ steps.filter.outputs.web }} steps: - uses: actions/checkout@v5 - uses: dorny/paths-filter@v3 id: filter with: filters: | api: - 'services/api/**' - 'packages/shared/**' web: - 'services/web/**' - 'packages/shared/**'
test-api: needs: changes if: ${{ needs.changes.outputs.api == 'true' }} runs-on: ubuntu-latest steps: - uses: actions/checkout@v5 - run: npm ci && npm run test --workspace=services/apiNote packages/shared/** appears under both filters: a change to shared code correctly rebuilds both consumers. That cross-dependency mapping is exactly what you want the agent to infer from your repo rather than hand-maintaining.
The interaction differs by tool. Cursor edits the YAML inline in the editor; Claude Code runs headless in the terminal and can verify with gh; Codex runs the same prompt as a non-interactive exec job or as a Cloud task off a GitHub issue.
In Agent mode, open the repo and point it at the real layout. Cursor reads package.json workspaces and the services/ tree, then writes .github/workflows/ci.yml as an inline diff you review before accepting:
Add a
changesjob usingdorny/paths-filter@v3to.github/workflows/ci.yml. Derive one filter per package in myworkspacesarray, and make every filter also matchpackages/shared/**. Gate each existing test job behind its matchingneeds.changes.outputs.*value. Useactions/checkout@v5.
Use a checkpoint before accepting so you can revert the whole edit in one click if the gating is wrong.
Run it headless from the repo root so Claude Code can inspect the tree and validate the result with the GitHub CLI:
claude "Read package.json workspaces and the services/ directory, then rewrite \.github/workflows/ci.yml to add a dorny/paths-filter@v3 'changes' job. One filter \per workspace, each also matching packages/shared/**, and gate every test job behind \its needs.changes.outputs.* flag."
# Validate the YAML parses and lists the new job before you pushgh workflow view ci.ymlFor repeatable use, wrap the prompt in a custom slash command (.claude/commands/affected.md) and rerun it whenever you add a service.
Codex runs the same task non-interactively. Locally, scope it to the workspace so it can edit files but not touch your shell:
codex exec --sandbox workspace-write \ "Rewrite .github/workflows/ci.yml: add a dorny/paths-filter@v3 changes job with one \filter per npm workspace (each also matching packages/shared/**), and gate each test \job on its needs.changes.outputs.* flag."To run it as a Cloud task off a GitHub issue, submit to your Codex environment with codex cloud exec --env <ENV_ID> "..." (list environments with codex cloud). Codex opens a branch and a PR with the edited workflow.
The Workflow: Flaky-Test Triage
Section titled “The Workflow: Flaky-Test Triage”Flaky tests don’t need an AI “stability analyzer” — they need someone to look at the last N runs, find the specs that fail nondeterministically, quarantine them, and open a ticket. That “someone” can be the agent, because it can read a failing run’s logs directly through the GitHub MCP server.
The GitHub MCP server is a remote Streamable HTTP endpoint. Install it once — the config is identical across all three tools since MCP is a shared standard:
claude mcp add --transport http github https://api.githubcopilot.com/mcp/With it connected, the agent can pull the failed job’s logs instead of you copy-pasting them. The before/after is the whole point: without the MCP server you paste a log dump; with it, you say “the last run of CI on this PR” and the agent fetches the annotations, the failing spec names, and the surrounding output itself.
In Agent mode with the GitHub MCP server enabled, reference the failing run and let Cursor pull the logs, then propose a quarantine diff for your test config:
The last CI run on this branch failed. Pull the failing job’s logs via the GitHub MCP server, list the specs that fail intermittently (passing on retry), and mark them with
test.skipplus a// FLAKY: <run-url>comment. Open a checklist of what you skipped.
Claude Code can drive the GitHub CLI directly, so you don’t even strictly need the MCP server here — pipe the failed log straight in:
gh run view --log-failed | claude "These are the failing jobs from our latest CI run. \Identify which Vitest specs are flaky (timeouts, race conditions, order-dependence) \versus genuinely broken. For the flaky ones, output a vitest.config.ts 'retry: 2' \setting and a list of describe blocks to move into a @flaky tag we exclude from \required checks."Run the triage as a non-interactive job that reads the same log file and edits the config under sandbox:
gh run view --log-failed > failed.logcodex exec --sandbox workspace-write \ "Read failed.log. Classify each failing Playwright test as flaky or real. For flaky \ones, add 'test.fixme' with a link comment and add { retries: 2 } to playwright.config.ts \for that project. Summarize the real failures separately so I can fix them by hand."The Workflow: Progressive Deploy with Auto-Rollback
Section titled “The Workflow: Progressive Deploy with Auto-Rollback”A binary deploy either fully succeeds or fully fails. A progressive deploy ships the new version to a slice of traffic, watches error rates, and rolls back automatically if they spike. On Cloudflare Workers this is built into wrangler versions — no service mesh required.
Have the agent write a deploy job that uploads a new version, routes 10% of traffic to it, and gates the full rollout on a health check:
# .github/workflows/deploy.yml (excerpt)- name: Upload new version run: npx wrangler versions upload --json > version.json
- name: Canary 10% of traffic run: | VID=$(jq -r '.id' version.json) npx wrangler versions deploy "$VID@10%" "${PREV}@90%" --yes
- name: Health gate run: ./scripts/check-error-rate.sh # exits non-zero if 5xx rate > threshold
- name: Promote to 100% if: success() run: npx wrangler versions deploy "$(jq -r .id version.json)@100%" --yes
- name: Rollback on failure if: failure() run: npx wrangler rollback --message "Auto-rollback: health gate failed"Write
.github/workflows/deploy.ymlfor a Cloudflare Worker. Usewrangler versions upload, thenwrangler versions deployto send 10% of traffic to the new version, runscripts/check-error-rate.shas a gate, promote to 100% on success, andwrangler rollbackon failure. Pinactions/checkout@v5.
claude "Generate .github/workflows/deploy.yml that does a gradual Cloudflare Workers \rollout: wrangler versions upload, deploy at 10% canary, run scripts/check-error-rate.sh \as a health gate, promote to 100% on pass, wrangler rollback on fail. Also stub \check-error-rate.sh to query the Workers analytics 5xx rate and exit 1 above 1%."Cloudflare’s wrangler agent skill and the wrangler MCP context help here — but the prompt above already produces a runnable workflow.
For deploys, run Codex in CI itself with the official action rather than locally. openai/codex-action@v1 installs the CLI and runs codex exec inside the job:
- uses: actions/checkout@v5- name: Generate deploy workflow uses: openai/codex-action@v1 with: openai-api-key: ${{ secrets.OPENAI_API_KEY }} prompt: 'Write a progressive Cloudflare Workers deploy using wrangler versions with a 10% canary, a health gate, and wrangler rollback on failure.' sandbox: workspace-write safety-strategy: drop-sudoPicking the Model
Section titled “Picking the Model”For agent-driven CI edits, the cross-file reasoning matters more than raw speed — the agent has to read your workspace layout, infer dependencies, and produce valid YAML in one shot. When budget matters less than getting it right in one pass, use Claude Fable 5 (/model fable in Claude Code, or the Cursor model picker) — it is Anthropic’s most capable model and excels at exactly this kind of multi-file work. When budget matters, use Claude Opus 4.8 for the change-detection and progressive-deploy passes and drop to Claude Sonnet 4.6 for the more mechanical flaky-test triage. See model comparison for pricing and a full capability ladder. Codex runs on GPT-5.5 by default across CLI, IDE, and Cloud; use gpt-5.2-codex when you authenticate the CLI with an API key (as the GitHub Action does above).
When This Breaks
Section titled “When This Breaks”CI is the one place where a confidently wrong agent edit costs you a broken main. The failure modes are specific.
If an agent-generated workflow fails to parse, don’t ask the agent to “fix the YAML” blind — paste the exact gh workflow view or Actions error back in. Pinning every action to a major version (@v5, @v3) also prevents the classic “it worked yesterday” break when a floating tag ships a breaking change.