Model routing — Opus/Sonnet/Haiku per task

Q3 · Tooling & subscription How do you pick the model for a task (Opus / Sonnet / Haiku / GPT-5 / Gemini)?

Max-score answer: “I mix models within a session — subagents on Haiku/Sonnet, planning on Opus, fallback to other vendors.”

Why this matters in 2026

The frontier model market moves quickly enough that a workflow pinned to one model will drift. Haiku 4.5 ($1/$5 per MTok) is useful for scoped subagents, codemods, and bulk edits; Opus 4.8 ($5/$25) and Fable 5 cover harder planning; GPT-5.6 and Gemini provide cross-vendor fallbacks. Route by measured task fit, total tokens, latency, and availability instead of assuming one model wins every workload.

What “max score” actually looks like

You get full marks on Q3 only when all four of these are true inside a single working day:

Planning runs on Fable 5 or Opus 4.8. When you enter Plan mode for a non-trivial change, the planner model is claude-fable-5 or claude-opus-4-8, not Sonnet — run /model fable before Plan mode when the task warrants it. Architecture decisions, multi-file refactors, debugging across a stack — these are the calls where a premium planner pays for itself many times over.
In this routing policy, execution defaults to Sonnet 5. Your normal coding loop — read files, edit, run tests, fix — runs on claude-sonnet-5. Current introductory pricing is $2/$10 through August 31, then $3/$15; benchmark results vary with harness and effort.
Scoped subagents test Haiku 4.5 explicitly. Pin selected explorers, triage agents, or doc writers to claude-haiku-4-5, then compare accepted output, retries, and total tokens with the inherited model. The lower rate card is not a task-level quality guarantee.
You have a cross-vendor fallback configured. GPT-5.6 Sol, Gemini 3.1 Pro, or both are wired up via OpenRouter, Claude Code’s external-provider flag, Cursor’s model picker, or a router like AI.cc — and you’ve actually used them at least once this month when Anthropic was degraded or when the task played to their strengths (huge multimodal context, agentic browser execution).

In the July 11 Artificial Analysis Coding Agent Index v1.1 snapshot, Fable 5 + Claude Code scored above Opus 4.8 + Claude Code, but that does not define a universal route. One opinionated pattern is Fable for difficult planning or final verification and Opus/Sonnet for implementation. Subagents may inherit the parent model unless you pin them explicitly, so verify the actual configuration and cost.

Anything less — “I always use Opus” or “I only use Sonnet” or “I keep meaning to set up a fallback” — is mid-tier on Q3.

Current landscape (web-verified)

Claude family in 2026

Anthropic’s lineup is now four tiers — Fable 5 arrived above Opus on June 9, 2026 — while the gap between the familiar three has narrowed dramatically, which is exactly what makes routing valuable.

Fable 5 (claude-fable-5) at $10 input / $50 output per million tokens — exactly 2× Opus 4.8. The tier above Opus excels at complex multi-file refactors, bug-fixing, from-scratch applications, and long-running tasks. It returned globally July 1; after temporary inclusion through July 7, it requires usage credits on eligible paid plans.
Opus 4.8 (claude-opus-4-8) at $5 input / $25 output per million tokens. The strongest premium planner below Fable 5. Excellent long-horizon reasoning and multi-file refactors. It is the account default on Max, Team Premium, Enterprise pay-as-you-go, and Anthropic API sessions; organization policy can override that mapping, and managed-cloud defaults differ.
Sonnet 5 (claude-sonnet-5) at an introductory $2 input / $10 output through August 31, then $3/$15. It is this guide’s recommended daily driver and the account default on Pro, Team Standard, and Enterprise subscription seats. Artificial Analysis notes that its high verbosity can make a max run cost more per evaluated task than Opus 4.8 at standard rates, so monitor total tokens rather than inferring cost from the rate card alone.
Haiku 4.5 (claude-haiku-4-5) at $1 input / $5 output. Fast, cheap, surprisingly capable. Strong at classification, summarization, well-scoped edits, and bulk parallel work. Hands down the right choice for subagents and codemods.

Opus 4.8 and Sonnet 5 include the full 1M-token context window at the rate-card price — no long-context surcharge; Haiku 4.5 tops out at 200K tokens (still large enough for most single-package triage passes). That changes the routing math: you can hand Sonnet or Opus the whole monorepo, and reach for Haiku on scoped slices that fit in 200K before deciding whether Opus needs to see anything.

Cross-vendor fallbacks

The “three-model frontier stack” pattern dominates 2026: Claude Opus 4.8 for rigorous long-running reasoning, GPT-5.6 Sol for agentic execution and tool-heavy professional knowledge work, Gemini 3.1 Pro for multimodal synthesis and huge-context analysis. Each wins on different tasks, and prices are down 40–80% year-over-year, so running 2–3 vendors is now well within the budget of a serious solo developer.

GPT-5.6 Sol — strongest on long agentic loops, tool-use chains, and structured outputs. Often a better fit when you need the model to drive a browser or hammer an API for an hour without losing the plot.
Gemini 3.1 Pro / 2.5 Pro — strongest on multimodal (image + code + PDF), enormous context windows, and Google-ecosystem tasks. Pull it out for tasks where you need to feed the model a screenshot of a Figma frame plus 200k tokens of repo code.
OpenRouter / AI.cc — single API surface that abstracts the provider. Switching between GPT-5.6 Sol, Claude Opus 4.8, Gemini 3.1 Pro, DeepSeek V4, Llama 4, Qwen 3.6-Plus is just a model parameter, and these routers will automatically fall back if your primary is down.

When you’d switch vendors

Not “always”, but with intent:

Anthropic is degraded. Status page red, sessions timing out, tool calls failing — flip to GPT-5.6 Sol via Cursor or your OpenRouter alias and keep shipping.
The task plays to a non-Anthropic strength. Pixel-perfect Figma-to-code from a screenshot? Gemini. A long-running autonomous browser agent that needs to stay coherent across 200 steps? GPT-5.6 Sol with its agent-mode tuning.
You want a second opinion. For high-stakes architecture decisions, asking the same plan question to Opus and GPT-5.6 Sol and diffing the answers is a cheap form of redundancy.
You’re price-sensitive on a specific loop. DeepSeek V4 and Qwen 3.6-Plus undercut even Haiku on some tasks; if you have a high-volume, low-judgment loop (e.g. translating commit messages), routing to one of them via OpenRouter halves the bill again.

Step-by-step: building a routing strategy

Pick your three tiers and assign them. Write down — literally, in a MODELS.md in your dotfiles — which model handles planning, execution, and bulk/parallel work. The default starting point: Fable 5 or Opus 4.8 for planning, Sonnet 5 for execution, Haiku 4.5 for subagents.
Configure a tool-appropriate daily route. Claude Code can pin Sonnet 5 with claude --model claude-sonnet-5; Cursor can use Auto or a manually selected available model; Codex should route within its GPT-5.6 tiers rather than pretending a Claude slug is an OpenAI default. Treat Sonnet-for-execution as this guide’s starting hypothesis and validate it on your workload.
Evaluate Fable or Opus for difficult planning. In Claude Code, use /model before Plan mode when architecture, ambiguity, or cross-service reasoning warrants a more expensive model. File count alone is not a reliable threshold. Compare the resulting plan and rework with Sonnet before standardizing the route.
Test Haiku on scoped subagents. Define selected explorers, triage agents, or doc writers with model: claude-haiku-4-5. Compare total tokens, latency, accepted findings, and retries with Sonnet; the lower rate card does not guarantee a fixed task saving or equal quality.
Configure at least one cross-vendor fallback. Two easy paths: (a) sign up for OpenRouter, set OPENROUTER_API_KEY, and add openrouter/openai/gpt-5.6-sol and openrouter/google/gemini-3.1-pro as alternates in your tool config; (b) in Cursor, just enable the OpenAI and Google providers in the model picker. Either way, use them once this week so the muscle memory exists.
Set a hard budget per model. In your cost dashboard (Anthropic Console, OpenRouter, AI.cc) set per-model alerts: Opus at $X/day, Sonnet at $Y/day, Haiku at $Z/day. When Opus alerts trigger, that’s usually a sign you’re using it for execution when Sonnet would do.
Review the mix weekly. Open the spend dashboard and compare model share with task outcomes. Set thresholds from your own baseline; no universal percentage proves overuse or underuse.
Re-route when evidence shifts. Re-check provider model cards and the explicitly versioned Artificial Analysis Coding Agent Index, then repeat your internal task set. Do not compare stale benchmark names or mix scores produced by different harnesses as if they were one leaderboard.

Common pitfalls

Defaulting to one premium model without measurement. A higher rate card can exhaust plan capacity faster, while a cheaper model can require retries. Route from accepted outcomes and total tokens.
Treating rate-card ratios as task-cost ratios. Fable 5 is priced at $10/$50 and Opus 4.8 at $5/$25 per MTok, but task length, caching, tool calls, and rework determine the actual multiple.
Assuming one model owns planning. Compare difficult plans across the models available to your account and pin a route only when your eval set supports it.
Assuming every subagent should use the cheapest model. Scope cheaper models deliberately and retain a quality fallback for tasks where your evaluations show misses.
No fallback configured. When Anthropic has a 4-hour incident — and they will, every vendor does — you lose half a day. A gpt-5.6-sol alias in your config is free insurance.
Inferring cost from the rate card alone. Tokenization, reasoning, context, tool calls, and caching change total cost. Watch measured usage for representative tasks.

How to verify you’re there

You can name, from memory, which Claude model handles planning, execution, and subagents in your setup — and they’re not all the same.
Now that there are four tiers, you know which one runs planning — Fable 5 (/model fable) when the task warrants it, Opus 4.8 otherwise.
Your tool config (~/.claude/settings.json, Cursor settings, or Codex config.toml) has explicit model entries — not just defaults.
At least one subagent in your project runs on claude-haiku-4-5.
At least one non-Anthropic provider (OpenAI, Google, or OpenRouter) is wired up and you’ve used it at least once this month.
You’ve checked your per-model spend in the last 7 days and the distribution roughly matches the pattern: Sonnet majority, Haiku ~20–30%, Opus ~10–20%.
You have a written rule (in CLAUDE.md, dotfiles, or your team handbook) for when to switch from Sonnet to Opus, and when to fan out to Haiku.
You can switch your default model with a single config edit, not a multi-hour migration — the routing layer is in your control, not welded into prompts.