Workflow tier — from one session to autonomous overnight runs

Q14 · Parallelism & automation Which workflow tier best describes you?

Max-score answer: “Tier 3: I also run autonomous overnight backlog drains (Codex Cloud, Cursor Cloud Agents).”

Why this matters in 2026

Parallel agents can improve throughput on independent, well-scoped tasks, but review bandwidth and merge risk become the constraint. The useful question is not a universal agent count or productivity multiplier; it is how many concurrent changes your team can review, test, and integrate without increasing rework. Start small, measure completion and rejection rates, and scale only while review quality holds.

What “max score” actually looks like

You get full marks on Q14 only when you can describe — concretely — how you operate at all three tiers and use the right one for the task at hand. The shape of a top-tier day:

Morning, Tier 3 wake-up. Before you even open your laptop, two overnight Codex Cloud runs and one Cursor Cloud Agent have each produced a draft PR for issues you tagged the day before. You triage them like a regular code review: merge the one that’s clean, push back on the one with hand-wavy commits, kill the third because the approach was wrong. That’s an hour of “free” output before standup.
Mid-morning, Tier 2 fan-out. You’re working on a real feature. You spawn 3 parallel local agents in git worktrees: one writing the migration, one building the API handler, one wiring up the UI component. You ping between them with tmux or Conductor / Claude Squad, reviewing whichever one needs human input next. None of them blocks the others.
Afternoon, Tier 1 deep focus. The gnarliest part of the day — the bit that needs real architectural judgement — is a single Claude Code session in Plan mode with Opus 4.8, no fan-out, you in the loop on every step. Sometimes one focused session beats five mediocre ones, and you know which is which.
Evening, Tier 3 queue. You leave the office having labelled 3–5 issues with the right tag (codex-cloud, cursor-agent, or just routed via your auto-PR hook) so they get picked up by autonomous agents while you sleep. Tomorrow’s morning triage repeats.

Anything less is sub-tier on Q14. Specifically:

“I run one Claude Code or Cursor session at a time” — that’s Tier 1, mid-tier on Q14.
“I run 2–3 parallel agents in worktrees, but I’ve never tried cloud agents” — that’s Tier 2, near-top but missing the asynchronous overnight loop where most of the leverage now lives.
“I’ve tried Cloud Agents once, didn’t trust them” — doesn’t count. Trust comes from the orchestration setup (layered review, scoped tasks, sensible labels), not from the model.

Current landscape (web-verified)

Tier 1: one terminal/IDE session

A single Claude Code, Cursor, or Codex CLI session in front of you, one prompt at a time, you reading the screen as it works. This is where everyone starts and where most developers still live in 2026.

Claude Code — Anthropic’s terminal-first agent. The default for deep work because Plan mode + Opus 4.8 + Sonnet 5 execution is a strong single-session reasoning loop; for the hardest architectural work, Claude Fable 5 (/model fable) now exceeds Opus 4.8 at 2× the cost — see model comparison for routing guidance. Best when the task needs serious architectural judgement or touches >5 files.
Cursor (IDE mode) — The “VS Code with an agent panel” experience. Best when you’re doing tight edit-test-edit loops and want the diff visible right next to the file. Strong for refactors where you want to scrub through changes hunk by hunk.
Codex CLI — OpenAI’s terminal-first agent, the local counterpart to Codex Cloud. Strong on long agentic tool-use loops and structured outputs. GPT-5.6 availability is plan-dependent: Free/Go use Terra, while Plus and higher can choose Sol, Terra, or Luna and set effort.

What you gain at Tier 1: full focus, full control, full understanding of every change. What you lose: throughput. One human + one agent ships at roughly one developer’s rate.

Tier 2: 2–4 parallel agents

Multiple local agents running at the same time, each on its own task, each in its own working directory (almost always git worktrees — see Q15). You’re orchestrating, not coding.

Conductor — Mac app that runs multiple Claude Code agents in parallel, each in a sandboxed git worktree, with a kanban-style UI for switching between them. Made specifically for the “I’m running 4 agents at once” workflow.
Claude Squad — Open-source terminal multiplexer for Claude Code / Codex / Aider / Gemini sessions, each in its own tmux pane and git worktree. Free and works for everyone, including Linux.
Cursor + git worktrees — Open one Cursor window per worktree, point each at a different branch, run an agent in each. Free, no extra tooling, works today.
tmux + Claude Code — The barebones version: one tmux window per agent, manually checked out into separate worktrees. Cheap, durable, no dependencies.

What you can gain at Tier 2 is concurrency on independent tasks. What you risk is lower review depth and merge conflicts if isolation and ownership are weak; measure the result rather than assuming a fixed multiplier.

Tier 3: autonomous overnight (Codex Cloud, Cursor Cloud Agents, Anthropic Computer Use)

Cloud-hosted agents running on isolated VMs that you don’t have to babysit. They wake up, run for an hour or eight, and surface a draft PR (or a failure to investigate). This is the 2026 frontier.

Codex Cloud (OpenAI) — Async cloud agent in a remote environment for batched work such as bug fixes, codemods, and documentation. ChatGPT Pro 5x is $100/month and Pro 20x is $200/month; included Codex usage can be extended with token credits, so viability depends on measured workload rather than a fixed weekly-message claim.
Cursor Cloud Agents — Each agent gets an isolated cloud environment with configured tools. Agents can build, test, record demos, and prepare PRs; validate their value against your own acceptance and rework metrics.
Anthropic Computer Use — The most general-purpose of the three. Drives a real desktop, including the browser, and is strongest when the task requires GUI work outside the IDE (clicking through admin panels, running through onboarding flows, screenshot-driven QA).

What you gain at Tier 3: throughput that no single human can produce. Reasonable nightly output without human attention. What you lose: visibility — you’re committing to “trust the layered-review pipeline” rather than watching every change.

Cost/benefit: review bandwidth limits

The cap on parallelism in 2026 is not the cost of compute. It’s how many agent-generated diffs a single human can meaningfully review per day. Field data converging from multiple teams:

Sustainable steady state: The number of parallel agents whose output a reviewer can still understand and validate. Set the cap from queue growth, rejection rate, and escaped defects rather than a universal number.
Burst capacity: 8–10 agents during a planned cleanup sprint (typo fixes, doc updates, dependency bumps), where the diffs are mechanical and reviewable in 30 seconds each. Don’t make this the everyday mode.
Time per Tier 3 PR review: 5–15 minutes on average when your layered-review setup is right (Q17). That’s the unit cost of running an autonomous agent. If you can’t dedicate that time, the agent just creates a queue of unreviewed PRs that decays.

This is why Tier 3 only pays off when your review pipeline (Q17), structuring-changes practice (Q25), and auto-PR workflow (Q16) are already solid. Skip those and Tier 3 will make you slower, not faster.

Step-by-step: progressing through tiers

Establish Tier 1 fluency first. You should be running a Claude Code or Cursor session daily, comfortable with Plan mode, comfortable killing a session and restarting when it’s gone off the rails. If a single session still feels chaotic, fanning out 4 of them will be 4× the chaos. Spend 2–4 weeks here.
Set up git worktrees. This is the prerequisite for Tier 2 and most of Tier 3. Configure your repo so you can spin up a worktree per agent in a single command — see Q15 for the exact setup. Without this, parallel agents will overwrite each other’s files and you’ll spend more time resolving merge conflicts than shipping.
Run your first parallel session. Open two tmux panes, each in a separate worktree, each running Claude Code on a different small task. Don’t try to ship anything important yet — the goal is to feel what context-switching between two agents is actually like. Most people find two is easy, three is the wall, four needs tooling.
Add Conductor or Claude Squad. Once you’ve felt the pain of switching between 3+ raw tmux panes, install Conductor (Mac) or Claude Squad (cross-platform). The UI savings compound — being able to glance at all your agents’ statuses on one screen is what makes 4+ parallel agents viable.
Wire up your auto-PR workflow. Tier 2 is dramatically more useful when each agent finishes by automatically opening a PR (see Q16). That way you don’t have to manually run git push && gh pr create four times in a row. The Stop hook + gh pr create recipe from Q16 covers this.
Try one Codex Cloud or Cursor Cloud Agent task. Pick a small, well-scoped, low-risk task, then review the result like any other PR. Repeat on representative tasks until you have evidence about acceptance and rework.
Build a small nightly queue. Once one cloud agent works reliably, queue only as much overnight work as you can review promptly the next morning.
Set an evidence-based concurrency cap. If the PR backlog or rejection/rework rate rises faster than merges, reduce fan-out.

Common pitfalls

Jumping straight to Tier 3 without a layered review pipeline. Cloud Agents produce PRs faster than you can read them. If your only line of defence is “I’ll review them carefully”, you’ll either let bad code through or stop shipping. Build the Q17 layered-review setup first (CodeRabbit / Copilot review / Sentry review) so the agent output is pre-filtered before it hits your queue.
Fanning out without git worktrees. Two agents in the same checkout will overwrite each other’s edits. Every parallel-agent veteran has this story; you don’t need to be the next. Use worktrees from day one of Tier 2.
Treating Tier 3 as “fire and forget”. Autonomous overnight runs still need human review. The trust-but-verify loop is what makes them safe; “I just merged 3 cloud-agent PRs without reading them” is how you spend tomorrow debugging.
Running more agents because you can. Compute is cheap; attention is not. Track accepted throughput and review quality, not spawn count.
Using Tier 3 for tasks that need real judgement. Architectural decisions, gnarly debugging across services, anything where the correct answer is “we shouldn’t build this at all” — these belong in a focused Tier 1 session with Opus 4.8 (or Fable 5 when you need peak intelligence) in Plan mode. Don’t outsource judgement calls to async cloud agents.
No labelling convention for cloud agents. Without a tagging scheme (codex-cloud, cursor-agent, claude-overnight), your backlog turns into a mess of “is this for me or for an agent?” issues. Define the tags once and stick to them.
Ignoring per-agent budgets. A misconfigured Cloud Agent can burn $30 in a runaway loop. Set per-agent timeouts and per-day spend caps in your Codex Cloud / Cursor dashboard before you scale up the queue.

How to verify you’re there

You can describe — without thinking — which task in your current backlog should go to Tier 1, Tier 2, or Tier 3, and why.
You’ve run at least one autonomous Codex Cloud or Cursor Cloud Agent task in the last 7 days, all the way through to a merged PR.
You have a labelling convention for “this issue should be picked up by a cloud agent” and you’ve used it this week.
Your repo has a working git worktrees setup (Q15) and you’ve spawned at least 2 parallel local agents in it this month.
You have an auto-PR workflow (Q16) so finishing-an-agent ends with an opened PR, not “go run gh pr create manually”.
You have a layered PR review (Q17) — CodeRabbit, Copilot review, or equivalent — so cloud-agent PRs are pre-filtered before you triage them.
You have an evidence-based concurrency cap tied to review bandwidth, queue growth, and rework.
Your morning routine includes a 30–60 minute slot for triaging overnight cloud-agent output, and it’s blocked on your calendar.
You have per-agent spend caps configured in Codex Cloud, Cursor, or whichever cloud-agent provider you use.