Skip to content

AI Model Comparison Guide

You open the model picker and see several options. Each has different strengths, context windows, and price points. This guide tells you which model to use for which task, when to switch, and how much it costs.

  • A clear default model recommendation for each tool
  • Decision criteria for when to switch models
  • Pricing breakdowns per request type
  • A model routing strategy you can use immediately
TaskRecommended ModelWhy
Hardest refactors, app builds, long-running tasksClaude Fable 5New tier above Opus — exceeds any generally available model
Complex coding (default)Claude Opus 4.8Top SWE-Bench scores, excellent agentic performance
Everyday coding (budget)Claude Sonnet 4.6Excellent quality at a fraction of the cost
Cheap parallel / bulk workClaude Haiku 4.5Drives subagents and codemods at ~1/5 of Sonnet cost
All Codex tasksGPT-5.5Default model across all Codex and ChatGPT surfaces
Fast iteration (Cursor)Cursor Composer 2.5In-house frontier-speed coding model
Large codebase (>200K tokens)Opus 4.8, GPT-5.5, Sonnet 4.6, or Gemini 3.1 Pro1M token context windows
Multimodal (images, video)Gemini 3.1 ProBest image/video analysis
Architecture and designClaude Opus 4.8Deep reasoning capabilities
ModelProviderContextOutput LimitSWE-BenchInput $/1MOutput $/1MSpeed
Claude Fable 5Anthropic1M128K$10$50Standard
Claude Opus 4.8Anthropic1M128KBest$5$25Standard
Claude Sonnet 4.6Anthropic1M64KStrong$3$15Standard
Claude Haiku 4.5Anthropic200K64KGood$1$5Fast
GPT-5.5OpenAI1M128KStrong$5$30Standard
Cursor Composer 2.5Cursor200KFast-frontier$0.50$2.50Fast
Gemini 3.1 ProGoogle1MGood$2$12Standard

The new top tier above Opus for the hardest work.

  • Released: June 9, 2026
  • Context window: 1M tokens with a 128K output limit
  • Key strength: Much better than Opus 4.8 at complex multi-file refactorings, bug-fixing, building applications from scratch, and long-running tasks demanding peak intelligence. On Cognition’s FrontierCode benchmark it posts the highest score among frontier models at medium effort.
  • Available in: Claude Code v2.1.170+ (/model fable), Cursor (model picker), Claude API (claude-fable-5)

When to use: When budget matters less than velocity and quality, set Fable 5 as your default model — subagents still auto-run on Opus, Sonnet, and Haiku, so cost stays contained while the main loop gets maximum intelligence. When budget matters, use Fable 5 for planning (Plan mode), Opus 4.8 or Sonnet 4.6 for implementation, then Fable 5 again for the final verification pass.

Pricing: $10 / $50 per 1M tokens (input/output) — exactly 2x Opus 4.8. Effort levels run low, medium, high, xhigh, and max; thinking is adaptive only.

During early testing, Stripe reported that Fable 5 compressed months of engineering into days. In a 50-million-line Ruby codebase, the model performed a codebase-wide migration in a day that would otherwise have taken a whole team over two months by hand.

Fable 5 is the generally-available, safety-tuned member of the Mythos class — in Anthropic’s words, “a Mythos-class model that we’ve made safe for general use.” Its sibling, Claude Mythos 5, is the same underlying model with safeguards lifted in some areas; initial access is restricted to Project Glasswing cyber defenders and critical-infrastructure providers.

The default recommendation for complex coding tasks.

  • Released: May 28, 2026
  • Context window: 1M tokens with a 128K output limit
  • Key strength: Around four times less likely than Opus 4.7 to leave flaws in its own code unflagged; beats GPT-5.5 on coding benchmarks
  • Available in: Claude Code (default), Cursor (model picker), Anthropic API, Bedrock, Vertex AI

When to use: Architecture decisions, complex debugging, multi-step autonomous tasks, security audits, system design. This is the Opus-tier flagship and Claude Code’s default model — one tier below Fable 5. Start here and only switch when you have a specific reason. Tune the speed/reasoning trade-off with the effort level (low, medium, high) via /model or /effort, and lean on its automatic dynamic workflows for long multi-step tasks.

Pricing: $5 / $25 per 1M tokens (input/output) — unchanged from Opus 4.7. Fast mode runs at 2× the standard rate for 2.5× the speed.

The budget-conscious workhorse with a massive context window.

  • Released: early 2026
  • Context window: 1M tokens
  • Key strength: Excellent coding at a fraction of Opus cost. Best value per token for everyday work.
  • Available in: Claude Code, Cursor, Anthropic API

When to use: Everyday coding tasks, when budget matters, when you need more than 200K tokens of context (large codebase analysis), or when your Opus quota is running low.

Pricing: $3 / $15 per 1M tokens (input/output).

The cheap, fast tier that powers parallel work.

  • Released: October 2025
  • Context window: 200K tokens
  • Key strength: Fast and inexpensive enough to drive subagents, codemods, and bulk file edits at roughly one-fifth of Sonnet cost
  • Available in: Claude Code (subagents and /model), Anthropic API

When to use: Read-only exploration, bulk scans, fan-out subagents, and simple formatting where you don’t need frontier reasoning. The Tier 1 model in a model-routing strategy.

Pricing: $1 / $5 per 1M tokens (input/output).

The default model across all Codex and ChatGPT surfaces.

  • Released: April 2026
  • Context window: Up to 1M tokens with a 128K output limit
  • Key strength: OpenAI’s newest frontier model for complex coding, computer use, and research. Leads Terminal-Bench 2.0 and is competitive on SWE-bench Verified.
  • Available in: Codex App, Codex CLI, Codex IDE, Codex Cloud, ChatGPT, API

When to use: All Codex workflows — this is the recommended default. Also strong for computer-use tasks and knowledge work. A GPT-5.5 Pro variant is available for maximum performance.

Pricing: $5 / $30 per 1M tokens (input/output); GPT-5.5 Pro is $30 / $180; Batch requests run at 50% of standard. Prompts above 272K input tokens are billed at 2× input / 1.5× output for the session. Also available via Codex subscription plans.

Best multimodal model with extreme context.

  • Released: February 2026
  • Context window: 1M tokens
  • Key strength: Best image, audio, and video analysis. Deep Think mode for complex reasoning.
  • Available in: Cursor (model picker), direct API

When to use: Tasks requiring more than 200K tokens of context, multimodal analysis (diagrams, screenshots, video walkthroughs), or when you need Deep Think reasoning mode.

Pricing: $2 / $12 per 1M tokens (input/output).

Frontier coding model built in-house by Cursor.

  • Released: May 18, 2026
  • Architecture: Mixture-of-Experts, enhanced with Cursor’s own continued pretraining and reinforcement learning
  • Context window: 200K tokens
  • Key strength: A substantial step up over Composer 2 — better at sustained work on long-running tasks and more reliable at following complex instructions
  • Available in: Cursor only

When to use: Fast local iteration in Cursor. Optimized for multi-file edits, code generation, refactoring, and long task chains across hundreds of actions.

Pricing: $0.50 / $2.50 per 1M tokens (standard); $3.00 / $15.00 (fast variant, the default).

Use this decision tree for day-to-day work:

  1. Start with your tool’s default: Opus 4.8 for Claude Code, GPT-5.5 for Codex
  2. Velocity and quality outweigh budget? Set Fable 5 as your default model — subagents still auto-run on Opus/Sonnet/Haiku, so cost stays contained while the main loop gets maximum intelligence. On a budget, route Fable 5 to Plan mode and the final verification pass only, with Opus or Sonnet doing the implementation
  3. Need speed in Cursor? Switch to Composer 2.5
  4. Need budget savings? Switch to Sonnet 4.6, or Haiku 4.5 for bulk/parallel work
  5. Context exceeds 200K? Use Opus 4.8, GPT-5.5, Sonnet 4.6, or Gemini 3.1 Pro (1M context)
  6. Multimodal analysis? Gemini 3.1 Pro
  7. Everything else? Stay with the default
Request TypeOpus 4.8Sonnet 4.6GPT-5.5Composer 2.5
Simple completion (1K tokens)~$0.03~$0.02~$0.03~$0.003
Standard refactor (10K tokens)~$0.30~$0.18~$0.35~$0.03
Large analysis (50K tokens)~$1.50~$0.90~$1.75~$0.15
Complex architecture (100K tokens)~$3.00~$1.80~$3.50~$0.30

A Claude Fable 5 request costs exactly 2x the Opus 4.8 column — $10 / $50 per 1M tokens versus $5 / $25.

PlanPriceModels IncludedBest For
Pro$20/monthAll models, ~500 fast requestsEveryday development
Ultra$200/monthAll models, ~10K requestsPower users

Model switching is free within your plan. You pay per request, not per model choice.

CategoryFable 5Opus 4.8Sonnet 4.6Haiku 4.5GPT-5.5Gemini 3.1 ProComposer 2.5
SWE-BenchBestStrongGoodStrongGoodStrong
Code generationBestExcellentVery goodGoodVery goodGoodVery good
Bug detectionBestExcellentVery goodGoodVery goodGoodGood
ArchitectureBestExcellentVery goodFairVery goodGoodGood
Computer useYesNoNoYesNoNo
Context window1M1M1M200K1M1M200K
Cost efficiency$10/$50PremiumBest valueCheapest (Claude)PremiumGood valueCheapest
  1. Identify your primary tool: Cursor, Claude Code, or Codex

  2. Start with the default model: Opus 4.8 (Claude Code), GPT-5.5 (Codex), or best available (Cursor)

  3. Evaluate task complexity: Simple tasks do not need the most expensive model

  4. Check context requirements: Files exceeding 200K tokens need Opus 4.8, Sonnet 4.6, GPT-5.5, or Gemini 3.1 Pro

  5. Consider budget: Track with /cost (Claude Code), Settings > Usage (Cursor), or Codex dashboard

  6. Adjust as needed: Switch models based on task, not habit

  1. Default to the best model for tasks that matter — architecture, security review, complex debugging
  2. Downgrade for routine work — simple fixes, boilerplate, and formatting do not need Opus 4.8
  3. Use speed models for iteration — Composer 2.5 in Cursor for rapid trial-and-error cycles
  4. Route bulk work to Haiku 4.5 — subagents, codemods, and fan-out scans cost a fraction of Opus
  5. Monitor costs weekly — track which models provide the best ROI for your workflow
  6. Stay updated — model capabilities and pricing change frequently. Check the Updates page