Skip to content

Model routing — Opus/Sonnet/Haiku per task

Q3 · Tooling & subscription How do you pick the model for a task (Opus / Sonnet / Haiku / GPT-5 / Gemini)?

Max-score answer: “I mix models within a session — subagents on Haiku/Sonnet, planning on Opus, fallback to other vendors.”

In 2026 the frontier model market moves too quickly to weld your workflow to a single model — Anthropic, OpenAI and Google each ship a new flagship roughly every quarter, and the price/quality leader rotates with every release. The cheap-fast tier (Haiku 4.5 at $1/$5 per million tokens) is now strong enough to drive subagents, codemods and bulk file edits at a fraction of Opus cost, while Opus 4.7 ($5/$25) still wins on long-chain planning and cross-file refactors that need 50+ tool calls of coherent reasoning. Routing intelligently between those tiers — and across vendors when one is down or worse at a specific task — is now a top-3 lever on both cost and shipped quality. Pick one model and commit, and you pay 3–5× more than you need to and lose during the 6–12 hours per quarter when your default provider has an incident. Mix models with intent and the same session can run 4 cheap parallel agents on Haiku, a deep architecture pass on Opus, and silently fail over to GPT-5.4 or Gemini 3.1 Pro when something breaks.

You get full marks on Q3 only when all four of these are true inside a single working day:

  • Planning runs on Opus 4.7. When you enter Plan mode for a non-trivial change, the planner model is claude-opus-4-7, not Sonnet. Architecture decisions, multi-file refactors, debugging across a stack — these are the calls where 3× input price pays for itself many times over.
  • Execution defaults to Sonnet 4.6. Your normal coding loop — read files, edit, run tests, fix — runs on claude-sonnet-4-6. It’s the sweet spot: 79.6% on SWE-bench Verified at $3/$15, roughly 97–99% of Opus quality on coding tasks at ~40% lower cost and 17% faster.
  • Subagents and bulk work run on Haiku 4.5. Subagent fan-out (code-explorer, code-reviewer, doc-writer), codemods, large refactors with mechanical edits, log triage, classification, and any “scan N files and report” loop runs on claude-haiku-4-5-20251001. At $1/$5 you can launch 5 parallel Haiku agents for less than one Opus pass.
  • You have a cross-vendor fallback configured. GPT-5.4, Gemini 3.1 Pro, or both are wired up via OpenRouter, Claude Code’s external-provider flag, Cursor’s model picker, or a router like AI.cc — and you’ve actually used them at least once this month when Anthropic was degraded or when the task played to their strengths (huge multimodal context, agentic browser execution).

Anything less — “I always use Opus” or “I only use Sonnet” or “I keep meaning to set up a fallback” — is mid-tier on Q3.

Anthropic’s 4.x lineup keeps the same three-tier structure but the gap between tiers has narrowed dramatically — which is exactly what makes routing valuable.

  • Opus 4.7 (claude-opus-4-7) at $5 input / $25 output per million tokens. Best raw reasoning, best long-horizon planning, best at multi-file refactors that need to hold 30+ files in context. Watch out: Opus 4.7 ships a new tokenizer that can produce up to 35% more tokens for the same input text, so real spend per request rises even though the rate card is unchanged from 4.6.
  • Sonnet 4.6 (claude-sonnet-4-6) at $3 input / $15 output. The default daily driver. Hits 79.6% on SWE-bench Verified — within a couple of points of Opus on most coding tasks while costing ~40% less and finishing ~17% faster. If you only learned one model name in 2026, this is it.
  • Haiku 4.5 (claude-haiku-4-5-20251001) at $1 input / $5 output. Fast, cheap, surprisingly capable. Strong at classification, summarization, well-scoped edits, and bulk parallel work. Hands down the right choice for subagents and codemods.

All three include the full 1M-token context window at the rate-card price — no long-context surcharge. That changes the routing math: you can now hand Haiku the whole repo for a triage pass before deciding whether Opus needs to see anything.

The “three-model frontier stack” pattern dominates 2026: Claude Opus 4.7 for rigorous long-running reasoning, GPT-5.4 for agentic execution and tool-heavy professional knowledge work, Gemini 3.1 Pro for multimodal synthesis and huge-context analysis. Each wins on different tasks, and prices are down 40–80% year-over-year, so running 2–3 vendors is now well within the budget of a serious solo developer.

  • GPT-5.4 / GPT-5.5 — strongest on long agentic loops, tool-use chains, and structured outputs. Often a better fit when you need the model to drive a browser or hammer an API for an hour without losing the plot.
  • Gemini 3.1 Pro / 2.5 Pro — strongest on multimodal (image + code + PDF), enormous context windows, and Google-ecosystem tasks. Pull it out for tasks where you need to feed the model a screenshot of a Figma frame plus 200k tokens of repo code.
  • OpenRouter / AI.cc — single API surface that abstracts the provider. Switching between GPT-5.5, Claude Opus 4.7, Gemini 3.1 Pro, DeepSeek V4, Llama 4, Qwen 3.6-Plus is just a model parameter, and these routers will automatically fall back if your primary is down.

Not “always”, but with intent:

  • Anthropic is degraded. Status page red, sessions timing out, tool calls failing — flip to GPT-5.4 via Cursor or your OpenRouter alias and keep shipping.
  • The task plays to a non-Anthropic strength. Pixel-perfect Figma-to-code from a screenshot? Gemini. A long-running autonomous browser agent that needs to stay coherent across 200 steps? GPT-5.4 with its agent-mode tuning.
  • You want a second opinion. For high-stakes architecture decisions, asking the same plan question to Opus and GPT-5.4 and diffing the answers is a cheap form of redundancy.
  • You’re price-sensitive on a specific loop. DeepSeek V4 and Qwen 3.6-Plus undercut even Haiku on some tasks; if you have a high-volume, low-judgment loop (e.g. translating commit messages), routing to one of them via OpenRouter halves the bill again.
  1. Pick your three tiers and assign them. Write down — literally, in a MODELS.md in your dotfiles — which model handles planning, execution, and bulk/parallel work. The default starting point: Opus 4.7 for planning, Sonnet 4.6 for execution, Haiku 4.5 for subagents.

  2. Set Sonnet 4.6 as your default in every tool. Claude Code: claude --model claude-sonnet-4-6 or set it in ~/.claude/settings.json. Cursor: model picker → Sonnet 4.6. Codex CLI: model = "claude-sonnet-4-6" in ~/.codex/config.toml (or your provider’s equivalent). This is the model you’ll be in 80% of the time.

  3. Wire Opus to Plan mode. In Claude Code, use the /model slash command to switch to Opus before entering Plan mode for anything non-trivial (anything touching >3 files, new architecture, debugging across services). When you exit Plan mode and start executing, drop back to Sonnet.

  4. Push subagents onto Haiku. Define your subagents (code-reviewer, code-explorer, doc-writer, refactor-bot) with model: claude-haiku-4-5-20251001 in their frontmatter. The orchestrator stays on Sonnet; the fan-out runs on Haiku. You’ll typically save 60–80% of the per-task cost without quality loss.

  5. Configure at least one cross-vendor fallback. Two easy paths: (a) sign up for OpenRouter, set OPENROUTER_API_KEY, and add openrouter/openai/gpt-5.4 and openrouter/google/gemini-3.1-pro as alternates in your tool config; (b) in Cursor, just enable the OpenAI and Google providers in the model picker. Either way, use them once this week so the muscle memory exists.

  6. Set a hard budget per model. In your cost dashboard (Anthropic Console, OpenRouter, AI.cc) set per-model alerts: Opus at $X/day, Sonnet at $Y/day, Haiku at $Z/day. When Opus alerts trigger, that’s usually a sign you’re using it for execution when Sonnet would do.

  7. Review the mix weekly. Open the spend dashboard every Friday. If Opus is more than ~25% of your spend, you’re probably over-using it on execution. If Haiku is under 10%, you’re probably not fanning out enough subagents.

  8. Re-route when the leaderboard shifts. Roughly once a quarter, a new model release reshuffles the best-for-task ranking. Re-check the SWE-bench Verified and Terminal-Bench Hard leaderboards, and update your three tier assignments. The whole point of this setup is that re-routing is a one-line config change, not a migration.

  • Defaulting to Opus for everything “because it’s the smart one”. You’ll burn through your Anthropic plan in days. Opus is for planning and gnarly debugging — not for the loop that fixes a typo and runs the tests.
  • Defaulting to Sonnet for planning too. Sonnet 4.6 is shockingly close to Opus on execution, but the gap re-opens on long-horizon planning and architectural reasoning. Pay the 3× for the plan; save it back on the execution.
  • Running subagents on the same model as the orchestrator. If your code-reviewer subagent runs on Sonnet, you’re paying 3× per review for ~zero quality gain over Haiku 4.5, which is already at Sonnet 4-class quality for review/triage work.
  • No fallback configured. When Anthropic has a 4-hour incident — and they will, every vendor does — you lose half a day. A gpt-5.4 alias in your config is free insurance.
  • Ignoring the Opus 4.7 tokenizer change. Same prompts can now cost 25–35% more on Opus 4.7 vs 4.6 because of how the new tokenizer chunks code. Watch your bill, not just the published rate card.
  • You can name, from memory, which Claude model handles planning, execution, and subagents in your setup — and they’re not all the same.
  • Your tool config (~/.claude/settings.json, Cursor settings, or Codex config.toml) has explicit model entries — not just defaults.
  • At least one subagent in your project runs on claude-haiku-4-5-20251001.
  • At least one non-Anthropic provider (OpenAI, Google, or OpenRouter) is wired up and you’ve used it at least once this month.
  • You’ve checked your per-model spend in the last 7 days and the distribution roughly matches the pattern: Sonnet majority, Haiku ~20–30%, Opus ~10–20%.
  • You have a written rule (in CLAUDE.md, dotfiles, or your team handbook) for when to switch from Sonnet to Opus, and when to fan out to Haiku.
  • You can switch your default model with a single config edit, not a multi-hour migration — the routing layer is in your control, not welded into prompts.