Skip to content

Token Management

You have been using Cursor for a month and your usage dashboard shows you have burned through your allocation twice as fast as expected. A quick audit reveals the pattern: your Agent prompts routinely attach 8-10 files of context, your always-apply rules consume 3,000 tokens before you type a single word, and your long conversations lose focus after message 12, causing the agent to re-read the same files multiple times. You are paying for context that is not contributing to better output.

Token management is not about being stingy. It is about being precise — giving the AI exactly the context it needs and nothing more. This produces better results at lower cost.

  • Techniques for reducing context overhead without sacrificing AI quality
  • Strategies for choosing the right model based on task complexity
  • Conversation management practices that keep token usage efficient
  • Cost estimation frameworks for team budgeting

Every Cursor interaction has a token budget. The budget is consumed by:

  1. System prompt and rules — Your always-apply rules, relevant glob-scoped rules, and Team Rules
  2. File context — Files you attach with @, files the agent reads during exploration
  3. Conversation history — Previous messages in the current chat
  4. Agent exploration — Files the agent reads while searching for relevant code

The context gauge in the prompt input shows how much of the budget is consumed. Hover over it to see which rules are active.

Most models in Cursor operate with a 200k token context window. That sounds like a lot until you realize:

  • A typical 500-line TypeScript file consumes approximately 3,000-5,000 tokens
  • An always-apply rule with 100 lines consumes approximately 500-1,000 tokens
  • A 10-message conversation history can consume 20,000-40,000 tokens
  • Agent exploration of a large codebase can consume 50,000+ tokens

When the context window fills up, Cursor must summarize or drop information. This is when the agent starts “forgetting” things you mentioned earlier or files it already read.

Every always-apply rule is included in every single prompt. If you have five always-apply rules totaling 2,000 tokens, that is 2,000 tokens of budget consumed before the conversation even begins.

Most rules do not need to be always-apply:

  • Code style rules -> Glob-scoped to the relevant file types (*.ts, *.tsx)
  • API conventions -> Glob-scoped to the API directory (src/api/**/*.ts)
  • Feature implementation guides -> Agent-decided (include a description so the agent loads them when relevant)
  • Project overview -> This one can stay always-apply, but keep it concise (under 50 lines)

Reference Files Instead of Inlining Content

Section titled “Reference Files Instead of Inlining Content”

Rules that copy example code inline waste tokens. Instead:

# Bad: Inlines the entire example (wastes tokens)
When creating API routes, follow this pattern:
[200 lines of example code]
# Good: References the file (loaded only when needed)
When creating API routes, follow the pattern in @src/routes/users.ts.

The most common source of wasted tokens is long conversations. After 6-8 exchanges, the conversation history consumes a significant portion of the context window, and the agent has to balance old context against new requests.

Start a new chat when:

  • You have finished one logical task and are starting another
  • The agent is repeating itself or referencing outdated information
  • You are shifting to a different area of the codebase
  • The context gauge is above 60% before you start typing

Instead of building up context over multiple messages, put everything relevant in the first message:

# Bad: Incremental context building (wastes 3x tokens)
Message 1: "Look at our auth module"
Message 2: "Now look at the user service too"
Message 3: "OK, now add rate limiting that works with both"
# Good: All context upfront (same result, fewer tokens)
"Add rate limiting to our API that integrates with:
- @src/middleware/auth.ts (authentication middleware)
- @src/services/user-service.ts (user service)
Follow the middleware pattern in auth.ts."

Different models have different costs and capabilities. Matching the model to the task saves tokens and money:

TaskRecommended ModelWhy
Complex multi-file featuresClaude Opus 4.6Best agentic performance, worth the cost for hard tasks
Everyday coding, bug fixesClaude Sonnet 4.5Strong performance at lower cost
Quick inline editsAny fast modelInline edits are small; model quality matters less
Large codebase explorationGemini 3 ProLargest context window for exploring extensive code
Simple refactoringClaude Sonnet 4.5Mechanical tasks do not need the most expensive model

Switch models with Cmd/Ctrl+/ to cycle through available models, or Cmd/Ctrl+. to access the model picker.

Background Agents use MAX mode models exclusively and can accumulate costs quickly. Manage them by:

  • Breaking tasks into smaller pieces: A $12 complex task often produces worse results than three $4 focused tasks
  • Setting spending limits: Configure monthly limits in Cursor Settings
  • Being specific in task descriptions: Vague instructions cause the agent to explore more files, consuming more tokens
  • Starting with a plan: Use Plan mode locally first, then hand a specific plan to the Background Agent

Instead of attaching every file that might be relevant, attach only the files the agent actually needs:

# Over-contextualized (wastes tokens on irrelevant files)
"Add a delete endpoint @src/routes/users.ts @src/routes/posts.ts
@src/routes/comments.ts @src/models/user.ts @src/models/post.ts
@src/middleware/auth.ts @src/middleware/rate-limit.ts @src/lib/db.ts"
# Minimum viable context
"Add a DELETE /api/users/:id endpoint in @src/routes/users.ts.
Follow the same pattern as the existing PATCH endpoint in that file."

The agent can explore additional files if it needs them. Start with minimal context and let the agent request more.

Instead of attaching individual files, use @folder to give the agent a structural overview of a directory. This costs fewer tokens than attaching every file in the folder and gives the agent enough information to know which files to read in detail.

Agent produces lower quality output with less context. You may have removed context it actually needed. The goal is to remove irrelevant context, not all context. If output quality drops after context reduction, add back the specific files that the agent is getting wrong.

Context gauge fills up mid-conversation. Start a new chat. Cursor condenses long conversations, but condensation loses detail. A fresh chat with focused context produces better results than a condensed long conversation.

Background Agent costs are unpredictable. Track your spending in the Cursor dashboard. Start with small tasks to calibrate cost expectations before launching expensive multi-file operations.

Switching models changes output quality noticeably. This is expected. Less capable models make more mistakes on complex tasks. Use cheaper models for mechanical work and invest in the best model for work that requires understanding and judgment.