Skip to content

Deep Reasoning: Extended Thinking and Effort Levels

You are debugging a race condition that only appears under load. The error logs show intermittent database connection timeouts, but only when two specific API endpoints are called simultaneously. You describe the problem to Claude and get a surface-level answer about adding retry logic. What you actually need is for Claude to reason through the connection pool lifecycle, transaction isolation levels, and request concurrency model — the kind of deep analysis that requires more than a quick response.

Extended thinking gives Claude the space to work through complex problems before answering, producing significantly better results on hard technical questions.

  • Understanding of when extended thinking actually helps (and when it does not)
  • How to toggle thinking per session and view the reasoning in verbose mode
  • Prompts optimized for extended thinking on architecture and debugging tasks
  • Configuration for controlling thinking depth via effort level and token budget

Extended thinking is enabled by default in Claude Code — Claude reasons through the problem step by step before producing a visible response. You control how deep this reasoning goes and whether it runs at all.

Toggle it per session — Press Option+T (macOS) or Alt+T (Windows/Linux) to turn thinking on or off for the current session. Set the global default with /config (saved as alwaysThinkingEnabled in ~/.claude/settings.json).

Tune the depth — On Claude Opus 4.8, thinking depth is governed by the effort level (see below). On other models it uses a token budget you can cap.

You can see the reasoning: press Ctrl+O to toggle verbose mode and Claude’s thinking appears as gray italic text. You are billed for thinking tokens even though Claude 4 models show a summarized version of the reasoning.

Extended thinking shines on problems with these characteristics:

  • Multiple interacting systems — Authentication flows, distributed transactions, event-driven architectures
  • Debugging without clear reproduction — Intermittent failures, race conditions, memory leaks
  • Architecture decisions with trade-offs — Choosing between approaches where the right answer depends on constraints
  • Security analysis — Finding vulnerabilities that require understanding data flow across components
  • Performance optimization — Identifying bottlenecks that span multiple layers

Extended thinking does NOT help much for:

  • Simple code generation (“write a function that sorts an array”)
  • Straightforward refactoring (“rename this variable”)
  • Questions with obvious answers (“what does this error message mean”)

The cost is higher token usage and slightly longer response times. Use it selectively.

Thinking is on by default. To toggle it for the current session, press the keyboard shortcut inside a Claude Code session:

  • macOS: Option+T
  • Windows / Linux: Alt+T

(Enabling Option-key shortcuts may require a one-time terminal configuration.) Press Ctrl+O to toggle verbose mode and watch the reasoning stream as gray italic text.

Set the global default from /config, or directly in settings.json:

{
"alwaysThinkingEnabled": true
}

For Claude Fable 5 and Claude Opus 4.8, the depth of thinking is controlled by effort level rather than a token budget. Set a persistent default with the environment variable, or pick a level for the current session with the /effort command (or the effort slider in the model picker):

Terminal window
# Persistent env-var values: low, medium, high (default), xhigh, max
export CLAUDE_CODE_EFFORT_LEVEL=high
Effort LevelHow to SetBest For
lowEnv var or /effortSimple tasks, quick questions
mediumEnv var or /effortEveryday development
high (default)Env var or /effortComplex architecture, debugging
xhighEnv var or /effortAdvanced coding, extended agentic exploration
max/effort (session); also env varGenuinely hard problems an expert would need time on
ultracode/effort only (session)Large tasks — sends xhigh plus Dynamic Workflows (parallel subagents)

max and ultracode are session-only choices from the /effort menu; the CLAUDE_CODE_EFFORT_LEVEL environment variable accepts low, medium, high, xhigh, and max (not ultracode).

For models other than Fable 5, Opus 4.8, or Opus 4.7, you can control the thinking budget directly:

Terminal window
# Default is 31,999 tokens
export MAX_THINKING_TOKENS=10000
# Disable thinking entirely
export MAX_THINKING_TOKENS=0

The way you phrase your prompt significantly affects thinking quality. Give Claude the context and constraints that require deep reasoning.

Combining Extended Thinking with Plan Mode

Section titled “Combining Extended Thinking with Plan Mode”

The most powerful workflow for complex features combines Plan mode with extended thinking:

  1. Confirm thinking is on (it is by default; Option+T / Alt+T toggles it) and set CLAUDE_CODE_EFFORT_LEVEL=high or max
  2. Switch to plan mode (Shift+Tab to cycle modes, or the VS Code mode selector)
  3. Describe your feature or problem
  4. Claude reasons deeply about the approach, then presents a plan
  5. You review the plan and approve or refine

This forces Claude to spend its thinking budget on planning rather than rushing to implementation.

Here is what the experience looks like for a real debugging session (thinking is on by default, so you just describe the problem):

I'm seeing intermittent 504 Gateway Timeouts on our /api/orders
endpoint. It only happens during peak hours (2-4pm EST) and
affects about 3% of requests. Our monitoring shows:
- Database query time is normal (< 50ms)
- The timeout happens after the query completes
- Memory usage on the server stays flat
- The issue started after we deployed the new payment
integration last Tuesday
Read @src/pages/api/orders.ts and @src/lib/payments.ts
and think through what could be causing this.

With extended thinking enabled, Claude is more likely to:

  • Notice that the payment integration makes a synchronous HTTP call to an external API
  • Realize that the external API has variable response times during peak hours
  • Identify that the 504 comes from the gateway timeout, not the database
  • Suggest moving the payment verification to an async background job

Without extended thinking, Claude might give a more superficial answer about database connection pooling or caching.

Thinking takes too long and you get impatient — Lower the effort level to medium or reduce MAX_THINKING_TOKENS. Not every task needs deep reasoning. Reserve it for genuinely complex problems.

Claude’s thinking seems to go in circles — This can happen with extremely ambiguous problems. Provide more constraints or narrow the question: “Focus specifically on the database connection pooling behavior, not the entire request lifecycle.”

Token costs spike with extended thinking — Extended thinking uses significantly more tokens. If you are on API billing, use /cost to monitor session spend. Consider switching to Claude Sonnet 4.6 for simpler tasks, using Opus 4.8 (the default) for most hard problems, and reserving Fable 5 (/model fable) for the most demanding refactors and long-running tasks where peak intelligence matters most. See model comparison for pricing details.

Thinking is enabled but responses are not noticeably better — The problem might not benefit from extended thinking. Simple, well-defined tasks produce similar results with or without it. Save thinking for genuinely ambiguous, multi-factor problems.

With deep reasoning in your toolkit, connect external tools via MCP to give Claude access to your databases, issue trackers, and monitoring systems.