Model routing — Opus/Sonnet/Haiku per task
Q3 · Tooling & subscription How do you pick the model for a task (Opus / Sonnet / Haiku / GPT-5 / Gemini)?
Max-score answer: “I mix models within a session — subagents on Haiku/Sonnet, planning on Opus, fallback to other vendors.”
Why this matters in 2026
Section titled “Why this matters in 2026”In 2026 the frontier model market moves too quickly to weld your workflow to a single model — Anthropic, OpenAI and Google each ship a new flagship roughly every quarter, and the price/quality leader rotates with every release. The cheap-fast tier (Haiku 4.5 at $1/$5 per million tokens) is now strong enough to drive subagents, codemods and bulk file edits at a fraction of Opus cost, while Opus 4.7 ($5/$25) still wins on long-chain planning and cross-file refactors that need 50+ tool calls of coherent reasoning. Routing intelligently between those tiers — and across vendors when one is down or worse at a specific task — is now a top-3 lever on both cost and shipped quality. Pick one model and commit, and you pay 3–5× more than you need to and lose during the 6–12 hours per quarter when your default provider has an incident. Mix models with intent and the same session can run 4 cheap parallel agents on Haiku, a deep architecture pass on Opus, and silently fail over to GPT-5.4 or Gemini 3.1 Pro when something breaks.
What “max score” actually looks like
Section titled “What “max score” actually looks like”You get full marks on Q3 only when all four of these are true inside a single working day:
- Planning runs on Opus 4.7. When you enter Plan mode for a non-trivial change, the planner model is
claude-opus-4-7, not Sonnet. Architecture decisions, multi-file refactors, debugging across a stack — these are the calls where 3× input price pays for itself many times over. - Execution defaults to Sonnet 4.6. Your normal coding loop — read files, edit, run tests, fix — runs on
claude-sonnet-4-6. It’s the sweet spot: 79.6% on SWE-bench Verified at $3/$15, roughly 97–99% of Opus quality on coding tasks at ~40% lower cost and 17% faster. - Subagents and bulk work run on Haiku 4.5. Subagent fan-out (code-explorer, code-reviewer, doc-writer), codemods, large refactors with mechanical edits, log triage, classification, and any “scan N files and report” loop runs on
claude-haiku-4-5-20251001. At $1/$5 you can launch 5 parallel Haiku agents for less than one Opus pass. - You have a cross-vendor fallback configured. GPT-5.4, Gemini 3.1 Pro, or both are wired up via OpenRouter, Claude Code’s external-provider flag, Cursor’s model picker, or a router like AI.cc — and you’ve actually used them at least once this month when Anthropic was degraded or when the task played to their strengths (huge multimodal context, agentic browser execution).
Anything less — “I always use Opus” or “I only use Sonnet” or “I keep meaning to set up a fallback” — is mid-tier on Q3.
Current landscape (web-verified)
Section titled “Current landscape (web-verified)”Claude family in 2026
Section titled “Claude family in 2026”Anthropic’s 4.x lineup keeps the same three-tier structure but the gap between tiers has narrowed dramatically — which is exactly what makes routing valuable.
- Opus 4.7 (
claude-opus-4-7) at $5 input / $25 output per million tokens. Best raw reasoning, best long-horizon planning, best at multi-file refactors that need to hold 30+ files in context. Watch out: Opus 4.7 ships a new tokenizer that can produce up to 35% more tokens for the same input text, so real spend per request rises even though the rate card is unchanged from 4.6. - Sonnet 4.6 (
claude-sonnet-4-6) at $3 input / $15 output. The default daily driver. Hits 79.6% on SWE-bench Verified — within a couple of points of Opus on most coding tasks while costing ~40% less and finishing ~17% faster. If you only learned one model name in 2026, this is it. - Haiku 4.5 (
claude-haiku-4-5-20251001) at $1 input / $5 output. Fast, cheap, surprisingly capable. Strong at classification, summarization, well-scoped edits, and bulk parallel work. Hands down the right choice for subagents and codemods.
All three include the full 1M-token context window at the rate-card price — no long-context surcharge. That changes the routing math: you can now hand Haiku the whole repo for a triage pass before deciding whether Opus needs to see anything.
Cross-vendor fallbacks
Section titled “Cross-vendor fallbacks”The “three-model frontier stack” pattern dominates 2026: Claude Opus 4.7 for rigorous long-running reasoning, GPT-5.4 for agentic execution and tool-heavy professional knowledge work, Gemini 3.1 Pro for multimodal synthesis and huge-context analysis. Each wins on different tasks, and prices are down 40–80% year-over-year, so running 2–3 vendors is now well within the budget of a serious solo developer.
- GPT-5.4 / GPT-5.5 — strongest on long agentic loops, tool-use chains, and structured outputs. Often a better fit when you need the model to drive a browser or hammer an API for an hour without losing the plot.
- Gemini 3.1 Pro / 2.5 Pro — strongest on multimodal (image + code + PDF), enormous context windows, and Google-ecosystem tasks. Pull it out for tasks where you need to feed the model a screenshot of a Figma frame plus 200k tokens of repo code.
- OpenRouter / AI.cc — single API surface that abstracts the provider. Switching between GPT-5.5, Claude Opus 4.7, Gemini 3.1 Pro, DeepSeek V4, Llama 4, Qwen 3.6-Plus is just a model parameter, and these routers will automatically fall back if your primary is down.
When you’d switch vendors
Section titled “When you’d switch vendors”Not “always”, but with intent:
- Anthropic is degraded. Status page red, sessions timing out, tool calls failing — flip to GPT-5.4 via Cursor or your OpenRouter alias and keep shipping.
- The task plays to a non-Anthropic strength. Pixel-perfect Figma-to-code from a screenshot? Gemini. A long-running autonomous browser agent that needs to stay coherent across 200 steps? GPT-5.4 with its agent-mode tuning.
- You want a second opinion. For high-stakes architecture decisions, asking the same plan question to Opus and GPT-5.4 and diffing the answers is a cheap form of redundancy.
- You’re price-sensitive on a specific loop. DeepSeek V4 and Qwen 3.6-Plus undercut even Haiku on some tasks; if you have a high-volume, low-judgment loop (e.g. translating commit messages), routing to one of them via OpenRouter halves the bill again.
Step-by-step: building a routing strategy
Section titled “Step-by-step: building a routing strategy”-
Pick your three tiers and assign them. Write down — literally, in a
MODELS.mdin your dotfiles — which model handles planning, execution, and bulk/parallel work. The default starting point: Opus 4.7 for planning, Sonnet 4.6 for execution, Haiku 4.5 for subagents. -
Set Sonnet 4.6 as your default in every tool. Claude Code:
claude --model claude-sonnet-4-6or set it in~/.claude/settings.json. Cursor: model picker → Sonnet 4.6. Codex CLI:model = "claude-sonnet-4-6"in~/.codex/config.toml(or your provider’s equivalent). This is the model you’ll be in 80% of the time. -
Wire Opus to Plan mode. In Claude Code, use the
/modelslash command to switch to Opus before entering Plan mode for anything non-trivial (anything touching >3 files, new architecture, debugging across services). When you exit Plan mode and start executing, drop back to Sonnet. -
Push subagents onto Haiku. Define your subagents (code-reviewer, code-explorer, doc-writer, refactor-bot) with
model: claude-haiku-4-5-20251001in their frontmatter. The orchestrator stays on Sonnet; the fan-out runs on Haiku. You’ll typically save 60–80% of the per-task cost without quality loss. -
Configure at least one cross-vendor fallback. Two easy paths: (a) sign up for OpenRouter, set
OPENROUTER_API_KEY, and addopenrouter/openai/gpt-5.4andopenrouter/google/gemini-3.1-proas alternates in your tool config; (b) in Cursor, just enable the OpenAI and Google providers in the model picker. Either way, use them once this week so the muscle memory exists. -
Set a hard budget per model. In your cost dashboard (Anthropic Console, OpenRouter, AI.cc) set per-model alerts: Opus at $X/day, Sonnet at $Y/day, Haiku at $Z/day. When Opus alerts trigger, that’s usually a sign you’re using it for execution when Sonnet would do.
-
Review the mix weekly. Open the spend dashboard every Friday. If Opus is more than ~25% of your spend, you’re probably over-using it on execution. If Haiku is under 10%, you’re probably not fanning out enough subagents.
-
Re-route when the leaderboard shifts. Roughly once a quarter, a new model release reshuffles the best-for-task ranking. Re-check the SWE-bench Verified and Terminal-Bench Hard leaderboards, and update your three tier assignments. The whole point of this setup is that re-routing is a one-line config change, not a migration.
Common pitfalls
Section titled “Common pitfalls”- Defaulting to Opus for everything “because it’s the smart one”. You’ll burn through your Anthropic plan in days. Opus is for planning and gnarly debugging — not for the loop that fixes a typo and runs the tests.
- Defaulting to Sonnet for planning too. Sonnet 4.6 is shockingly close to Opus on execution, but the gap re-opens on long-horizon planning and architectural reasoning. Pay the 3× for the plan; save it back on the execution.
- Running subagents on the same model as the orchestrator. If your code-reviewer subagent runs on Sonnet, you’re paying 3× per review for ~zero quality gain over Haiku 4.5, which is already at Sonnet 4-class quality for review/triage work.
- No fallback configured. When Anthropic has a 4-hour incident — and they will, every vendor does — you lose half a day. A
gpt-5.4alias in your config is free insurance. - Ignoring the Opus 4.7 tokenizer change. Same prompts can now cost 25–35% more on Opus 4.7 vs 4.6 because of how the new tokenizer chunks code. Watch your bill, not just the published rate card.
How to verify you’re there
Section titled “How to verify you’re there”- You can name, from memory, which Claude model handles planning, execution, and subagents in your setup — and they’re not all the same.
- Your tool config (
~/.claude/settings.json, Cursor settings, or Codexconfig.toml) has explicit model entries — not just defaults. - At least one subagent in your project runs on
claude-haiku-4-5-20251001. - At least one non-Anthropic provider (OpenAI, Google, or OpenRouter) is wired up and you’ve used it at least once this month.
- You’ve checked your per-model spend in the last 7 days and the distribution roughly matches the pattern: Sonnet majority, Haiku ~20–30%, Opus ~10–20%.
- You have a written rule (in
CLAUDE.md, dotfiles, or your team handbook) for when to switch from Sonnet to Opus, and when to fan out to Haiku. - You can switch your default model with a single config edit, not a multi-hour migration — the routing layer is in your control, not welded into prompts.