Model routing — Opus/Sonnet/Haiku per task
Q3 · Tooling & subscription How do you pick the model for a task (Opus / Sonnet / Haiku / GPT-5 / Gemini)?
Max-score answer: “I mix models within a session — subagents on Haiku/Sonnet, planning on Opus, fallback to other vendors.”
Why this matters in 2026
Section titled “Why this matters in 2026”In 2026 the frontier model market moves too quickly to weld your workflow to a single model — Anthropic, OpenAI and Google each ship a new flagship roughly every quarter, and the price/quality leader rotates with every release. The cheap-fast tier (Haiku 4.5 at $1/$5 per million tokens) is now strong enough to drive subagents, codemods and bulk file edits at a fraction of Opus cost, while Opus 4.8 ($5/$25) still wins on long-chain planning and cross-file refactors that need 50+ tool calls of coherent reasoning. Routing intelligently between those tiers — and across vendors when one is down or worse at a specific task — is now a top-3 lever on both cost and shipped quality. Pick one model and commit, and you pay 3–5× more than you need to and lose during the 6–12 hours per quarter when your default provider has an incident. Mix models with intent and the same session can run 4 cheap parallel agents on Haiku, a deep architecture pass on Opus, and silently fail over to GPT-5.5 or Gemini 3.1 Pro when something breaks.
What “max score” actually looks like
Section titled “What “max score” actually looks like”You get full marks on Q3 only when all four of these are true inside a single working day:
- Planning runs on Fable 5 or Opus 4.8. When you enter Plan mode for a non-trivial change, the planner model is
claude-fable-5orclaude-opus-4-8, not Sonnet — run/model fablebefore Plan mode when the task warrants it. Architecture decisions, multi-file refactors, debugging across a stack — these are the calls where a premium planner pays for itself many times over. - Execution defaults to Sonnet 4.6. Your normal coding loop — read files, edit, run tests, fix — runs on
claude-sonnet-4-6. It’s the sweet spot: 79.6% on SWE-bench Verified at $3/$15, roughly 97–99% of Opus quality on coding tasks at ~40% lower cost and 17% faster. - Subagents and bulk work run on Haiku 4.5. Subagent fan-out (code-explorer, code-reviewer, doc-writer), codemods, large refactors with mechanical edits, log triage, classification, and any “scan N files and report” loop runs on
claude-haiku-4-5. At $1/$5 you can launch 5 parallel Haiku agents for less than one Opus pass. - You have a cross-vendor fallback configured. GPT-5.5, Gemini 3.1 Pro, or both are wired up via OpenRouter, Claude Code’s external-provider flag, Cursor’s model picker, or a router like AI.cc — and you’ve actually used them at least once this month when Anthropic was degraded or when the task played to their strengths (huge multimodal context, agentic browser execution).
Since Fable 5 landed above Opus on June 9, 2026, there are two canonical ways to run this. When budget matters less than velocity and quality, set Fable 5 as your default model — subagents still auto-run on Opus/Sonnet/Haiku, so cost stays contained while the main loop gets maximum intelligence. When budget matters, use Fable 5 for planning (Plan mode), Opus 4.8 or Sonnet 4.6 for implementation, then switch back to Fable 5 for the final verification/review pass.
Anything less — “I always use Opus” or “I only use Sonnet” or “I keep meaning to set up a fallback” — is mid-tier on Q3.
Current landscape (web-verified)
Section titled “Current landscape (web-verified)”Claude family in 2026
Section titled “Claude family in 2026”Anthropic’s lineup is now four tiers — Fable 5 arrived above Opus on June 9, 2026 — while the gap between the familiar three has narrowed dramatically, which is exactly what makes routing valuable.
- Fable 5 (
claude-fable-5) at $10 input / $50 output per million tokens — exactly 2× Opus 4.8. The new tier above Opus, released June 9, 2026: much better than Opus at complex multi-file refactors, bug-fixing, building applications from scratch, and long-running tasks that demand peak intelligence. 1M-token context window, 128K max output. From June 9 through June 22, 2026 it is included on Pro, Max, Team, and seat-based Enterprise plans at no extra cost; on June 23, 2026 it drops off those plans and further use requires usage credits. - Opus 4.8 (
claude-opus-4-8) at $5 input / $25 output per million tokens. The strongest planner below Fable 5 — and still the Claude Code default model. Excellent long-horizon reasoning and multi-file refactors that need to hold 30+ files in context. Watch out: Opus 4.7 introduced a new tokenizer (carried into 4.8) that can produce up to ~35% more tokens for the same input text, so if you are jumping from 4.6 your real spend per request rises even though the 4.8 rate card matches 4.6. Going 4.7 → 4.8 is roughly flat. - Sonnet 4.6 (
claude-sonnet-4-6) at $3 input / $15 output. The default daily driver. Hits 79.6% on SWE-bench Verified — within a couple of points of Opus on most coding tasks while costing ~40% less and finishing ~17% faster. If you only learned one model name in 2026, this is it. - Haiku 4.5 (
claude-haiku-4-5) at $1 input / $5 output. Fast, cheap, surprisingly capable. Strong at classification, summarization, well-scoped edits, and bulk parallel work. Hands down the right choice for subagents and codemods.
Opus 4.8 and Sonnet 4.6 include the full 1M-token context window at the rate-card price — no long-context surcharge; Haiku 4.5 tops out at 200K tokens (still large enough for most single-package triage passes). That changes the routing math: you can hand Sonnet or Opus the whole monorepo, and reach for Haiku on scoped slices that fit in 200K before deciding whether Opus needs to see anything.
Cross-vendor fallbacks
Section titled “Cross-vendor fallbacks”The “three-model frontier stack” pattern dominates 2026: Claude Opus 4.8 for rigorous long-running reasoning, GPT-5.5 for agentic execution and tool-heavy professional knowledge work, Gemini 3.1 Pro for multimodal synthesis and huge-context analysis. Each wins on different tasks, and prices are down 40–80% year-over-year, so running 2–3 vendors is now well within the budget of a serious solo developer.
- GPT-5.5 / GPT-5.4 — strongest on long agentic loops, tool-use chains, and structured outputs. Often a better fit when you need the model to drive a browser or hammer an API for an hour without losing the plot.
- Gemini 3.1 Pro / 2.5 Pro — strongest on multimodal (image + code + PDF), enormous context windows, and Google-ecosystem tasks. Pull it out for tasks where you need to feed the model a screenshot of a Figma frame plus 200k tokens of repo code.
- OpenRouter / AI.cc — single API surface that abstracts the provider. Switching between GPT-5.5, Claude Opus 4.8, Gemini 3.1 Pro, DeepSeek V4, Llama 4, Qwen 3.6-Plus is just a model parameter, and these routers will automatically fall back if your primary is down.
When you’d switch vendors
Section titled “When you’d switch vendors”Not “always”, but with intent:
- Anthropic is degraded. Status page red, sessions timing out, tool calls failing — flip to GPT-5.5 via Cursor or your OpenRouter alias and keep shipping.
- The task plays to a non-Anthropic strength. Pixel-perfect Figma-to-code from a screenshot? Gemini. A long-running autonomous browser agent that needs to stay coherent across 200 steps? GPT-5.5 with its agent-mode tuning.
- You want a second opinion. For high-stakes architecture decisions, asking the same plan question to Opus and GPT-5.5 and diffing the answers is a cheap form of redundancy.
- You’re price-sensitive on a specific loop. DeepSeek V4 and Qwen 3.6-Plus undercut even Haiku on some tasks; if you have a high-volume, low-judgment loop (e.g. translating commit messages), routing to one of them via OpenRouter halves the bill again.
Step-by-step: building a routing strategy
Section titled “Step-by-step: building a routing strategy”-
Pick your three tiers and assign them. Write down — literally, in a
MODELS.mdin your dotfiles — which model handles planning, execution, and bulk/parallel work. The default starting point: Fable 5 or Opus 4.8 for planning, Sonnet 4.6 for execution, Haiku 4.5 for subagents. -
Set Sonnet 4.6 as your default in every tool. Claude Code:
claude --model claude-sonnet-4-6or set it in~/.claude/settings.json. Cursor: model picker → Sonnet 4.6. Codex CLI:model = "claude-sonnet-4-6"in~/.codex/config.toml(or your provider’s equivalent). This is the model you’ll be in 80% of the time. -
Wire Fable or Opus to Plan mode. In Claude Code, use the
/modelslash command to switch up before entering Plan mode for anything non-trivial (anything touching >3 files, new architecture, debugging across services) —/model fableis the top option for non-trivial plans, Opus 4.8 the cheaper one. When you exit Plan mode and start executing, drop back to Sonnet. -
Push subagents onto Haiku. Define your subagents (code-reviewer, code-explorer, doc-writer, refactor-bot) with
model: claude-haiku-4-5in their frontmatter. The orchestrator stays on Sonnet; the fan-out runs on Haiku. You’ll typically save 60–80% of the per-task cost without quality loss. -
Configure at least one cross-vendor fallback. Two easy paths: (a) sign up for OpenRouter, set
OPENROUTER_API_KEY, and addopenrouter/openai/gpt-5.5andopenrouter/google/gemini-3.1-proas alternates in your tool config; (b) in Cursor, just enable the OpenAI and Google providers in the model picker. Either way, use them once this week so the muscle memory exists. -
Set a hard budget per model. In your cost dashboard (Anthropic Console, OpenRouter, AI.cc) set per-model alerts: Opus at $X/day, Sonnet at $Y/day, Haiku at $Z/day. When Opus alerts trigger, that’s usually a sign you’re using it for execution when Sonnet would do.
-
Review the mix weekly. Open the spend dashboard every Friday. If Opus is more than ~25% of your spend, you’re probably over-using it on execution. If Haiku is under 10%, you’re probably not fanning out enough subagents.
-
Re-route when the leaderboard shifts. Roughly once a quarter, a new model release reshuffles the best-for-task ranking. Re-check the SWE-bench Verified and Terminal-Bench Hard leaderboards, and update your three tier assignments. The whole point of this setup is that re-routing is a one-line config change, not a migration.
Common pitfalls
Section titled “Common pitfalls”- Defaulting to Opus for everything “because it’s the smart one”. You’ll burn through your Anthropic plan in days. Opus is for planning and gnarly debugging — not for the loop that fixes a typo and runs the tests.
- Running the whole session on Fable 5 when budget matters. It costs 2× Opus ($10/$50 vs $5/$25); reserve it for planning and the final verification pass, or consciously accept the cost as your default.
- Defaulting to Sonnet for planning too. Sonnet 4.6 is shockingly close to Opus on execution, but the gap re-opens on long-horizon planning and architectural reasoning. Pay the 3× for the plan; save it back on the execution.
- Running subagents on the same model as the orchestrator. If your code-reviewer subagent runs on Sonnet, you’re paying 3× per review for ~zero quality gain over Haiku 4.5, which is already at Sonnet 4-class quality for review/triage work.
- No fallback configured. When Anthropic has a 4-hour incident — and they will, every vendor does — you lose half a day. A
gpt-5.5alias in your config is free insurance. - Ignoring the 4.7-era tokenizer change. Same prompts can cost 0–35% more on Opus 4.8 vs 4.6 because of the 4.7-era tokenizer change (carried into 4.8) and how it chunks code. Watch your bill, not just the published rate card.
How to verify you’re there
Section titled “How to verify you’re there”- You can name, from memory, which Claude model handles planning, execution, and subagents in your setup — and they’re not all the same.
- Now that there are four tiers, you know which one runs planning — Fable 5 (
/model fable) when the task warrants it, Opus 4.8 otherwise. - Your tool config (
~/.claude/settings.json, Cursor settings, or Codexconfig.toml) has explicit model entries — not just defaults. - At least one subagent in your project runs on
claude-haiku-4-5. - At least one non-Anthropic provider (OpenAI, Google, or OpenRouter) is wired up and you’ve used it at least once this month.
- You’ve checked your per-model spend in the last 7 days and the distribution roughly matches the pattern: Sonnet majority, Haiku ~20–30%, Opus ~10–20%.
- You have a written rule (in
CLAUDE.md, dotfiles, or your team handbook) for when to switch from Sonnet to Opus, and when to fan out to Haiku. - You can switch your default model with a single config edit, not a multi-hour migration — the routing layer is in your control, not welded into prompts.