Workflow tier — from one session to autonomous overnight runs
Q14 · Parallelism & automation Which workflow tier best describes you?
Max-score answer: “Tier 3: I also run autonomous overnight backlog drains (Codex Cloud, Cursor Cloud Agents).”
Why this matters in 2026
Section titled “Why this matters in 2026”The single biggest productivity gap between average and top-tier AI developers in 2026 isn’t model choice or prompt skill — it’s how many agents you can run in parallel without melting your review pipeline. A developer running one Claude Code session ships at roughly the rate of a fast solo engineer. A developer running 5–6 parallel agents — some local, some on cloud VMs that woke up at 2 a.m. to drain the backlog — ships at the rate of a small team. The frontier in 2026 sits at around 5–6 simultaneous agents per human reviewer; beyond that, review bandwidth becomes the bottleneck and the parallelism gains evaporate into merge conflicts and shallow approvals. Cursor’s own internal data is the cleanest signal of where this is going: more than 35% of pull requests merged at Cursor’s engineering team are now written by autonomous Cloud Agents — up from zero eighteen months ago. The teams that win this year aren’t the ones who write better prompts; they’re the ones who set up the orchestration layer so they can review three PRs while two more agents are still drafting code on cloud VMs, and a third agent is silently working through a labelled backlog overnight.
What “max score” actually looks like
Section titled “What “max score” actually looks like”You get full marks on Q14 only when you can describe — concretely — how you operate at all three tiers and use the right one for the task at hand. The shape of a top-tier day:
- Morning, Tier 3 wake-up. Before you even open your laptop, two overnight Codex Cloud runs and one Cursor Cloud Agent have each produced a draft PR for issues you tagged the day before. You triage them like a regular code review: merge the one that’s clean, push back on the one with hand-wavy commits, kill the third because the approach was wrong. That’s an hour of “free” output before standup.
- Mid-morning, Tier 2 fan-out. You’re working on a real feature. You spawn 3 parallel local agents in git worktrees: one writing the migration, one building the API handler, one wiring up the UI component. You ping between them with
tmuxor Conductor / Claude Squad, reviewing whichever one needs human input next. None of them blocks the others. - Afternoon, Tier 1 deep focus. The gnarliest part of the day — the bit that needs real architectural judgement — is a single Claude Code session in Plan mode with Opus 4.7, no fan-out, you in the loop on every step. Sometimes one focused session beats five mediocre ones, and you know which is which.
- Evening, Tier 3 queue. You leave the office having labelled 3–5 issues with the right tag (
codex-cloud,cursor-agent, or just routed via your auto-PR hook) so they get picked up by autonomous agents while you sleep. Tomorrow’s morning triage repeats.
Anything less is sub-tier on Q14. Specifically:
- “I run one Claude Code or Cursor session at a time” — that’s Tier 1, mid-tier on Q14.
- “I run 2–3 parallel agents in worktrees, but I’ve never tried cloud agents” — that’s Tier 2, near-top but missing the asynchronous overnight loop where most of the leverage now lives.
- “I’ve tried Cloud Agents once, didn’t trust them” — doesn’t count. Trust comes from the orchestration setup (layered review, scoped tasks, sensible labels), not from the model.
Current landscape (web-verified)
Section titled “Current landscape (web-verified)”Tier 1: one terminal/IDE session
Section titled “Tier 1: one terminal/IDE session”A single Claude Code, Cursor, or Codex CLI session in front of you, one prompt at a time, you reading the screen as it works. This is where everyone starts and where most developers still live in 2026.
- Claude Code — Anthropic’s terminal-first agent. The default for deep work because Plan mode + Opus 4.7 + Sonnet 4.6 execution is still the strongest single-session reasoning loop available. Best when the task needs serious architectural judgement or touches >5 files.
- Cursor (IDE mode) — The “VS Code with an agent panel” experience. Best when you’re doing tight edit-test-edit loops and want the diff visible right next to the file. Strong for refactors where you want to scrub through changes hunk by hunk.
- Codex CLI — OpenAI’s terminal-first agent, the local counterpart to Codex Cloud. Strong on long agentic tool-use loops and structured outputs. Pair it with GPT-5.4 or GPT-5.5 for tasks where you want a non-Anthropic second opinion.
What you gain at Tier 1: full focus, full control, full understanding of every change. What you lose: throughput. One human + one agent ships at roughly one developer’s rate.
Tier 2: 2–4 parallel agents
Section titled “Tier 2: 2–4 parallel agents”Multiple local agents running at the same time, each on its own task, each in its own working directory (almost always git worktrees — see Q15). You’re orchestrating, not coding.
- Conductor — Mac app that runs multiple Claude Code agents in parallel, each in a sandboxed git worktree, with a kanban-style UI for switching between them. Made specifically for the “I’m running 4 agents at once” workflow.
- Claude Squad — Open-source terminal multiplexer for Claude Code / Codex / Aider / Gemini sessions, each in its own tmux pane and git worktree. Free and works for everyone, including Linux.
- Cursor + git worktrees — Open one Cursor window per worktree, point each at a different branch, run an agent in each. Free, no extra tooling, works today.
- tmux + Claude Code — The barebones version: one tmux window per agent, manually checked out into separate worktrees. Cheap, durable, no dependencies.
What you gain at Tier 2: 2–3× throughput on independent tasks. What you lose: context fidelity per agent (you’re scanning fast, missing things) and a real risk of merge conflicts if you don’t use worktrees properly.
Tier 3: autonomous overnight (Codex Cloud, Cursor Cloud Agents, Anthropic Computer Use)
Section titled “Tier 3: autonomous overnight (Codex Cloud, Cursor Cloud Agents, Anthropic Computer Use)”Cloud-hosted agents running on isolated VMs that you don’t have to babysit. They wake up, run for an hour or eight, and surface a draft PR (or a failure to investigate). This is the 2026 frontier.
- Codex Cloud (OpenAI) — Async cloud agent on a sandboxed VM. Strong for overnight batched work — bug fixes, codemods, doc-writing across many files. The $200/month Codex Pro tier unlocks the heavier weekly limits that make this actually viable for daily use. Codex can now schedule future work and wake itself up to continue tasks that span days or weeks.
- Cursor Cloud Agents — Launched in February 2026. Each agent gets its own VM with a terminal, browser, and full desktop, plus your configured dev environment. Agents can build software, test it themselves, record video demos of their work, and produce merge-ready PRs. Cursor’s own engineering team merges >35% of PRs from these agents.
- Anthropic Computer Use — The most general-purpose of the three. Drives a real desktop, including the browser, and is strongest when the task requires GUI work outside the IDE (clicking through admin panels, running through onboarding flows, screenshot-driven QA).
What you gain at Tier 3: throughput that no single human can produce. Reasonable nightly output without human attention. What you lose: visibility — you’re committing to “trust the layered-review pipeline” rather than watching every change.
Cost/benefit: review bandwidth limits
Section titled “Cost/benefit: review bandwidth limits”The cap on parallelism in 2026 is not the cost of compute. It’s how many agent-generated diffs a single human can meaningfully review per day. Field data converging from multiple teams:
- Sustainable steady state: 5–6 parallel agents per reviewing human, mixed across all three tiers. Beyond that, review quality collapses — you start green-lighting code you don’t really understand, and the resulting bugs eat your apparent productivity gain.
- Burst capacity: 8–10 agents during a planned cleanup sprint (typo fixes, doc updates, dependency bumps), where the diffs are mechanical and reviewable in 30 seconds each. Don’t make this the everyday mode.
- Time per Tier 3 PR review: 5–15 minutes on average when your layered-review setup is right (Q17). That’s the unit cost of running an autonomous agent. If you can’t dedicate that time, the agent just creates a queue of unreviewed PRs that decays.
This is why Tier 3 only pays off when your review pipeline (Q17), structuring-changes practice (Q25), and auto-PR workflow (Q16) are already solid. Skip those and Tier 3 will make you slower, not faster.
Step-by-step: progressing through tiers
Section titled “Step-by-step: progressing through tiers”-
Establish Tier 1 fluency first. You should be running a Claude Code or Cursor session daily, comfortable with Plan mode, comfortable killing a session and restarting when it’s gone off the rails. If a single session still feels chaotic, fanning out 4 of them will be 4× the chaos. Spend 2–4 weeks here.
-
Set up git worktrees. This is the prerequisite for Tier 2 and most of Tier 3. Configure your repo so you can spin up a worktree per agent in a single command — see Q15 for the exact setup. Without this, parallel agents will overwrite each other’s files and you’ll spend more time resolving merge conflicts than shipping.
-
Run your first parallel session. Open two tmux panes, each in a separate worktree, each running Claude Code on a different small task. Don’t try to ship anything important yet — the goal is to feel what context-switching between two agents is actually like. Most people find two is easy, three is the wall, four needs tooling.
-
Add Conductor or Claude Squad. Once you’ve felt the pain of switching between 3+ raw tmux panes, install Conductor (Mac) or Claude Squad (cross-platform). The UI savings compound — being able to glance at all your agents’ statuses on one screen is what makes 4+ parallel agents viable.
-
Wire up your auto-PR workflow. Tier 2 is dramatically more useful when each agent finishes by automatically opening a PR (see Q16). That way you don’t have to manually run
git push && gh pr createfour times in a row. The Stop hook +gh pr createrecipe from Q16 covers this. -
Try one Codex Cloud or Cursor Cloud Agent task. Pick a small, well-scoped, low-risk task — fix a known bug, write tests for a single file, port a small util to another language. Tag it with
codex-cloudor use Cursor’s “Send to Cloud Agent” button. Walk away. Come back in an hour and review the result like you would any other PR. Repeat 5–10 times until you trust the output as much as a junior contractor. -
Build your nightly queue. Once one cloud agent works, scale to a small queue. End each working day by labelling 3–5 issues for overnight autonomous work. Morning triage becomes the first 30–60 minutes of your day. This is where the real Tier 3 leverage shows up — your throughput compounds across calendar time, not just working hours.
-
Cap at 5–6 concurrent agents. Once you’re comfortable, the natural temptation is to spawn 10. Don’t. Track your effective review throughput — if your PR backlog is growing faster than you’re merging, you’re past the cap. Pull back to the sustainable steady state.
Common pitfalls
Section titled “Common pitfalls”- Jumping straight to Tier 3 without a layered review pipeline. Cloud Agents produce PRs faster than you can read them. If your only line of defence is “I’ll review them carefully”, you’ll either let bad code through or stop shipping. Build the Q17 layered-review setup first (CodeRabbit / Copilot review / Sentry review) so the agent output is pre-filtered before it hits your queue.
- Fanning out without git worktrees. Two agents in the same checkout will overwrite each other’s edits. Every parallel-agent veteran has this story; you don’t need to be the next. Use worktrees from day one of Tier 2.
- Treating Tier 3 as “fire and forget”. Autonomous overnight runs still need human review. The trust-but-verify loop is what makes them safe; “I just merged 3 cloud-agent PRs without reading them” is how you spend tomorrow debugging.
- Running 8+ agents because you can. Compute is cheap; your attention isn’t. Past 5–6 concurrent, review bandwidth becomes the binding constraint and you ship less, not more. Track your merge throughput, not your spawn count.
- Using Tier 3 for tasks that need real judgement. Architectural decisions, gnarly debugging across services, anything where the correct answer is “we shouldn’t build this at all” — these belong in a focused Tier 1 session with Opus in Plan mode. Don’t outsource judgement calls to async cloud agents.
- No labelling convention for cloud agents. Without a tagging scheme (
codex-cloud,cursor-agent,claude-overnight), your backlog turns into a mess of “is this for me or for an agent?” issues. Define the tags once and stick to them. - Ignoring per-agent budgets. A misconfigured Cloud Agent can burn $30 in a runaway loop. Set per-agent timeouts and per-day spend caps in your Codex Cloud / Cursor dashboard before you scale up the queue.
How to verify you’re there
Section titled “How to verify you’re there”- You can describe — without thinking — which task in your current backlog should go to Tier 1, Tier 2, or Tier 3, and why.
- You’ve run at least one autonomous Codex Cloud or Cursor Cloud Agent task in the last 7 days, all the way through to a merged PR.
- You have a labelling convention for “this issue should be picked up by a cloud agent” and you’ve used it this week.
- Your repo has a working git worktrees setup (Q15) and you’ve spawned at least 2 parallel local agents in it this month.
- You have an auto-PR workflow (Q16) so finishing-an-agent ends with an opened PR, not “go run
gh pr createmanually”. - You have a layered PR review (Q17) — CodeRabbit, Copilot review, or equivalent — so cloud-agent PRs are pre-filtered before you triage them.
- You’ve capped your concurrent agent count at 5–6 in your head, and you can name the reason (review bandwidth, not compute cost).
- Your morning routine includes a 30–60 minute slot for triaging overnight cloud-agent output, and it’s blocked on your calendar.
- You have per-agent spend caps configured in Codex Cloud, Cursor, or whichever cloud-agent provider you use.