Skip to content

Team parallelism (Tier 2) — 5-6 parallel agents with solved merge patterns

Scorecard question: Does the team use parallel‑agent tools (Conductor, Claude Squad, Cursor + git worktrees)? Max‑score answer (3 pts): Part of team workflow — 5‑6 parallel agents per dev ceiling, with solved merge‑conflict patterns.

Why this matters in 2026 (task per dev → feature per dev per day)

Section titled “Why this matters in 2026 (task per dev → feature per dev per day)”

In 2024 the unit of engineering throughput was “task per developer.” A senior backend engineer claimed one Linear ticket, opened one branch, ran one AI pair‑programmer in their editor, and shipped one PR. The dev:AI ratio was effectively 1:1, the merge story was “rebase before push,” and the bottleneck sat at human typing speed.

In 2026 that unit broke. The same backend engineer now runs five or six agents in parallel — one drafting the migration, one writing the route handler, one fixing a flaky test the linter just surfaced, one drafting the changelog, one refactoring a sibling module that the migration touched. Each agent lives in its own git worktree on its own branch, on its own model (Sonnet 4.5 for routing, Opus for the hard refactor, GPT‑5‑Codex for the migration), and the human’s job is no longer “type code” — it is “review, merge, and orchestrate.” The unit of throughput is now closer to feature per developer per day.

That sounds like a 5‑6x productivity story, and at the artifact level it is. But the merge/review bottleneck arrives fast. The moment two agents touch the same file, you get a textual conflict that git is happy to flag but no agent is happy to resolve cleanly. The moment six agents finish at the same time, you get a review queue that backs up faster than any human can drain it. The moment your repo’s CI runs serially, your parallelism collapses into a queue at the merge stage. Tier 2 is the level where a team has shipped past the “I tried two agents and they fought” valley and into a working pattern: per‑dev parallelism with a real ceiling, real conflict tooling, and a real merge queue.

If you scored 0 on this question, your team is still on Tier 0 (one developer, one Cursor window, no worktrees). If you scored 1, one or two enthusiasts are running Claude Squad on their own laptop while everyone else watches. Tier 2 — three points — is when 5‑6 parallel agents per dev is the default workflow, and the merge story is solved enough that nobody is afraid to fan out.

What “max score” actually looks like (Tier 2 as default, merge playbook)

Section titled “What “max score” actually looks like (Tier 2 as default, merge playbook)”
  • Every active developer runs ≥3 parallel agents on the average workday. Not “can run if they want to” — actually does, every day, as the default workflow. Pair‑programmer mode (one agent, sit and watch) is reserved for design discussion and the trickiest 10% of tasks.
  • A documented per‑dev ceiling of 5‑6 concurrent agents. Empirically, this is where laptop fans, API rate limits, and the human’s review bandwidth all peak together. The team has agreed on a ceiling and the orchestrator tool enforces it.
  • Worktree isolation is mandatory, not optional. Every agent runs in its own git worktree directory on its own branch. Nobody runs two agents against the same working copy. Tools like Conductor and Claude Squad enforce this; teams that script their own worktree workflow document the helper script in the repo.
  • A merge playbook that survives 5‑6 PRs landing in one afternoon. Three components: (1) a merge queue (GitHub Merge Queue, Mergify, or Aviator) that linearizes integration; (2) a conflict‑resolving agent that is invoked when git detects overlap, with a documented prompt and reviewer pattern; (3) small‑scope task decomposition so agents rarely land on the same file in the first place.
  • Review automation keeps up with the agent fan‑out. CodeRabbit / Greptile / Diamond reviews land on every PR within minutes; humans review the review, not the raw diff. (See Q9 below.)
  • The team has a name for the failure modes. “Worktree drift,” “the duplicate implementation problem,” “the silent rebase loop” — Tier 2 teams have shared vocabulary because they have all hit these and learned the playbook.
  • Pair programming is still a tool, not the floor. A Tier 2 team does not abandon synchronous deep work; it picks pair mode deliberately for the 10% of tasks where the cognitive load of orchestration would beat the parallelism gain.

Concretely: on a Wednesday afternoon, a senior engineer kicks off six worktrees from a sprint planning doc, walks to lunch, comes back to four green PRs in the merge queue, one PR with a flagged conflict that a resolver agent has already drafted three options for, and one agent that hit a rate limit and self‑paused. Total wall‑clock: 90 minutes. Total shipped: a feature that would have been a 2‑day ticket in 2024.

Current landscape (web‑search‑verified)

Section titled “Current landscape (web‑search‑verified)”

Conductor (from Melty Labs) is the canonical Mac‑first dashboard for running parallel Claude Code and Codex agents, each in an isolated git worktree. It sits next to your editor — not inside it — and presents a board view of every agent: which branch, which model, what it’s working on, current cost burn, and PR status. The pitch is “ship the orchestration UX that Cursor and the terminals don’t have.”

Three things make Conductor a Tier 2 default for many teams. First, worktree creation is one click — no shell aliases, no git worktree add muscle memory. Second, the dashboard surfaces all agents at once, which makes the 5‑6 ceiling enforceable visually (you can see the queue). Third, it ships with PR creation and merge‑queue integration built in, so the handoff from “agent finished” to “PR landed” is one keystroke rather than a manual context switch.

The constraint: Mac‑only and laptop‑bound. For Linux teams or anyone who wants to fan out to cloud agents, Conductor is not the answer — see Cursor Background Agents or Claude Code Cloud, covered in Q15.

Claude Squad is the Go‑based terminal app for zero‑setup parallelism. It manages multiple Claude Code (and Codex / Aider) instances simultaneously, each in its own tmux pane, each rooted in its own git worktree on its own branch. The mental model is “I have one keyboard but six terminals, and they’re all running agents.”

Claude Squad’s traction comes from three properties: it works on any OS that runs tmux, it does not require leaving the terminal, and it composes with whatever editor and PR workflow you already use. For teams who already live in tmux/Neovim, it slots in cleanly. For teams who want a GUI dashboard, Conductor is the better fit.

Two gotchas worth flagging. First, tmux navigation gets thick at 5‑6 panes — Tier 2 teams invest in a tmux config (prefix bindings, status bar showing agent state) that they share via dotfiles. Second, Claude Squad does not enforce a per‑dev ceiling; the human has to. Most teams set a soft cap in the team handbook and a hard cap via API rate limits.

For teams already on Cursor (most paying engineering orgs in 2026), the cheapest path to Tier 2 is “Cursor + git worktrees + a few shell scripts.” Cursor 3’s multi‑window mode lets you open one Cursor instance per worktree, each running its own agent, each on its own branch. The worktree management is manual — you run git worktree add ../repo-feature-x feature-x yourself, or you wrap it in a bin/wt script that the whole team uses.

This is the path for teams that want incremental adoption without buying a new tool. It works, but it demands more discipline: the team has to document the worktree script, the naming convention (../repo-<slug> or ~/wt/<slug> are common), and the cleanup ritual (git worktree remove after merge). Without the script and the convention, you get worktree drift — stale branches, half‑finished features, three copies of node_modules.

Cursor Background Agents (cloud‑side, not laptop‑side) are a separate story — they belong to Tier 3, covered in Q15.

5‑6 parallel ceiling (review bandwidth math)

Section titled “5‑6 parallel ceiling (review bandwidth math)”

The “why 5‑6, not 10” question gets asked at every team’s first parallelism retrospective. The math is well‑documented by 2026: the practical ceiling on a single developer’s laptop is 5‑7 concurrent agents before three independent constraints converge — API rate limits (Anthropic and OpenAI both throttle at the org level around 5‑10 concurrent heavy requests per seat), local compute (six agents × Node/Python toolchains + a browser + the human’s editor saturates an M‑class MacBook), and review bandwidth (a human can meaningfully review at most 4‑6 PRs per hour, and that’s before reading the code, just acknowledging it).

The dominant constraint is the third. Six agents that finish in 20 minutes produce six PRs that need review now. If your CodeRabbit / Greptile setup isn’t fast enough to land a first pass before the human gets to the PR, the human spends review time on diff comprehension instead of judgment, and the throughput gain collapses. Tier 2 teams pick 5‑6 as the ceiling because it is the highest number where review automation can still front‑run the human.

Push past 5‑6 and you don’t get more throughput — you get a longer review queue, a higher merge‑conflict rate, and a slower median PR age. Tier 3 (the next question, Q15) is the answer for “I really do want 20 agents going” — but it lives on overnight runs and cloud agents, not on the developer’s laptop during the workday.

Merge conflict patterns (rebase queues, conflict‑resolving agents, smaller scopes)

Section titled “Merge conflict patterns (rebase queues, conflict‑resolving agents, smaller scopes)”

The dominant Tier 2 finding by mid‑2026 is that merge conflicts at six‑agent parallelism are not the same problem as merge conflicts at one‑agent parallelism. Git detects text overlap; AI agents generate overlapping changes with partial context. You get more conflicts, and they’re subtler — two agents reimplementing the same helper under different names, two agents fixing the same bug with two different patches, one agent’s refactor breaking the import graph the other agent assumed.

Three patterns have converged as the Tier 2 playbook:

  1. Merge queues that linearize integration. GitHub Merge Queue, Mergify, and Aviator all do the same essential thing: when six PRs are ready, they rebase one at a time against the freshest main, run CI, and merge — so the second PR rebases against the merged first PR, not against the same main the first PR saw. This catches the duplicate‑implementation and refactor‑conflict cases before they land. Tier 2 teams treat the merge queue as table stakes, not as a nice‑to‑have.

  2. Conflict‑resolving agents with a documented prompt. When a conflict surfaces, a dedicated resolver agent reads the conflict markers, the two PR descriptions, the relevant slice of git history, and proposes a merged version. The pattern most teams settle on: the resolver writes the merged file, opens a comment with its reasoning, and tags a human to confirm. The resolver is not allowed to merge — it drafts. The human merges. This pattern is what turns “merge hell” into “30 seconds of judgment per conflict.”

  3. Smaller scopes by design. The strongest predictor of low conflict rate is task decomposition. Tier 2 teams break work into smaller, more orthogonal tasks before they fan out. Instead of “build the user profile feature,” it’s “add the profile route,” “add the profile component,” “add the profile API,” “add the profile tests” — each agent owns a directory, the directories rarely overlap, and the conflict rate drops by 60‑80%. The discipline is in the planning, not in the merging.

The combination matters: a merge queue without small scopes still produces conflicts (just in a serialized order); small scopes without a merge queue still produce a race condition on main; conflict‑resolving agents without either still produce a backlog of conflicts to triage. All three together is what Tier 2 looks like in practice.

Step‑by‑step: rolling out Tier 2 in the team

Section titled “Step‑by‑step: rolling out Tier 2 in the team”
  1. Pick the orchestration tool that matches the team’s editor. If most engineers are on Cursor, start with Cursor + git worktrees + a shared bin/wt script. If most are on Mac with mixed editors, evaluate Conductor. If most live in tmux/Neovim, evaluate Claude Squad. Don’t try to standardize all three at once — pick one as the team default and let one or two contrarians keep their setup, documented.

  2. Document the worktree convention before any fan‑out. Pick a worktree root (~/wt/<slug> is common), a naming pattern, and a cleanup ritual. Commit the helper script (bin/wt new <slug>, bin/wt cleanup) to the repo. Add a one‑page README for the new pattern. Without this, you get worktree drift inside the first two weeks.

  3. Set the per‑dev ceiling explicitly. 5 or 6, pick one. Document it in the team handbook. Communicate that the ceiling is about review bandwidth, not laptop CPU — going above it doesn’t make you more productive, it makes your PR queue slower. Soft‑enforce via culture; hard‑enforce via API rate‑limit budgets per seat.

  4. Stand up the merge queue before you turn on parallelism. Configure GitHub Merge Queue (or Mergify / Aviator) on the default branch with the required CI checks. Test it with two manual PRs that conflict, observe the linearization, and tune the wait times. The queue is the safety net — turning on six agents without it is how teams get burned in week one.

  5. Wire review automation to the PR template. Add CodeRabbit, Greptile, or Diamond (whichever your team standardizes on — see Q9) to every PR opened by an agent. The first‑pass review must land before the human looks at the PR; otherwise the human reads the diff cold and the parallelism gain collapses.

  6. Author and commit the conflict‑resolver prompt. Write a prompts/resolver.md (or .claude/agents/resolver.md) that includes: the two PR descriptions, the conflict markers, the git log slice, and the instruction “draft the merged file and explain the reasoning; do not merge.” Make this resolver invocation a one‑liner in your bin/wt script. Test on a synthetic conflict.

  7. Run a one‑week pilot at 3 agents per dev. Don’t jump to 6 on day one. Three agents lets the team learn the worktree workflow, the merge queue, and the review pipeline without overwhelming any one human. After a week, post‑mortem at a team retro: where did time actually go? What broke?

  8. Ratchet to 5‑6 once the merge story holds. When the pilot shows zero merge‑queue failures and CodeRabbit lands before the human on >90% of PRs, raise the ceiling to 5 or 6. Re‑pilot for one more week. Watch the median PR age — if it goes up, you’re past your real ceiling and need to back off or invest in Q9 review automation.

  9. Define the “use pair mode” exceptions. Document the 10‑20% of tasks where parallel is the wrong tool: architecture spikes, security‑sensitive paths, anything that requires whiteboard‑level reasoning. Make pair‑programming an explicit team mode, not a default. This prevents the equally bad opposite drift: agents everywhere, no deep thinking ever.

  10. Run a quarterly Tier 2 retro. Tool capabilities shift fast. Review what the team is actually using vs the documented stack, where the merge playbook held vs failed, and whether the ceiling should move. Re‑evaluate Conductor / Claude Squad / Cursor every quarter — and budget for at least one tool swap per year.

  • No merge queue, six agents. The fastest way to burn the team’s trust in parallel agents is to land six PRs against main on the same afternoon with no linearization. The third PR breaks because it rebased against stale main; the team blames “AI flakiness” and reverts to one‑agent mode. Fix: merge queue first, agents second. Always.
  • No review automation, six PRs. Even with a merge queue, six PRs landing on a human at 4pm produces either a rubber‑stamp review (which silently degrades the codebase) or a backlog into next week (which kills the throughput gain). CodeRabbit / Greptile / Diamond is not optional at Tier 2 — see Q9.
  • Agent overload without a ceiling. The “if 5 is good, 15 is better” instinct is wrong. Past the ceiling, you spend more time orchestrating than the parallelism saves. Hard‑cap concurrency at the tool level (Conductor and Claude Squad both support it) and the API‑budget level.
  • Worktree drift. Three weeks in, the team has 47 stale worktrees on their laptops, 200GB of duplicated node_modules, and three half‑finished branches per engineer. Fix: bin/wt cleanup runs in CI, the team runs it weekly, and any worktree without a corresponding open PR after seven days gets archived.
  • Duplicate implementations. Two agents independently write the same helper function with different names because their context windows didn’t overlap. The merge queue catches this only at integration. Fix at planning time: shared scratch buffer / planning doc / Linear sub‑task assignment that lists the helpers each agent will touch.
  • Skipping the conflict‑resolver prompt. When a conflict surfaces, engineers paste the markers into Claude one‑off, get an answer, and don’t commit the prompt anywhere. Six months later, six engineers each have their own private resolver workflow. Fix: commit the resolver prompt to the repo from day one and treat it as a team asset.
  • Letting individual contributors set their own ceilings. “I work better with 9 agents” is almost always wrong, but it’s culturally hard to refuse without data. Capture per‑engineer median PR age and merge‑conflict rate; surface it in 1:1s. The data settles the argument.
  • Treating Tier 2 as a destination. Tier 2 is the team parallelism state. Tier 3 (overnight runs, cloud agents, autonomous loops) is the next question, Q15. Don’t conflate them. Tier 2 first, Tier 3 only after Tier 2 is solid for at least a quarter.
  • A random workday’s Slack at 4pm shows multiple engineers reporting “shipped 3‑5 PRs today” without it being a notable event.
  • Your GitHub merge queue dashboard shows 5‑20 PRs serialized through it per day, with single‑digit failure rate.
  • The repo contains a bin/wt (or equivalent) script and a one‑page README of the worktree convention, both updated within the last quarter.
  • The repo contains a prompts/resolver.md (or equivalent) committed in the codebase and referenced from the contributing guide.
  • The team handbook states a per‑dev concurrent‑agent ceiling and the reason (review bandwidth, not laptop CPU).
  • A new hire can stand up the parallel workflow in their first week from documentation alone — no Slack DM required.
  • Median PR age is lower than it was 6 months ago, not higher. (If it’s higher, you’re past your ceiling.)
  • Every PR opened by an agent gets a first‑pass automated review before the human looks at it, in under 5 minutes.
  • The team has a name for at least three failure modes (“worktree drift,” “duplicate implementation,” “silent rebase loop”) and a documented fix for each.
  • Pair‑programming mode is still used — for the 10‑20% of tasks where it’s the right tool — and the team can name which tasks those are.