Tooling policy — standardized stack with an exception process

Scorecard question: What’s the tooling policy (Claude Code / Cursor / Codex / others)? Max‑score answer (3 pts): Standardized stack + exception process for “let me try a new tool”.

Why this matters in 2026

Shadow AI is the fastest‑growing governance failure in engineering orgs right now. In mid‑2026, Microsoft moved Agent 365 to general availability specifically because enterprise customers were losing track of which agents, seats, and models their developers were running. Every team has at least one engineer who bought Cursor Ultra on a personal card, another running Claude Code through a side gig key, and a third piping snippets into a free ChatGPT tab. None of that shows up in your SSO logs, DLP scans, audit reports, or bulk‑plan invoices.

That is the audit hole, the billing leak, and the knowledge‑sharing hole rolled into one. You cannot enforce data residency on a tool you do not know exists, get enterprise pricing on a seat sitting on someone’s personal Stripe, or share CLAUDE.md templates with engineers using a tool your platform team has never opened.

But the failure mode on the other side is just as expensive. Several CTOs in late 2025 reacted to Shadow AI by picking one tool — usually “Copilot only because Legal already cleared it” — and banning everything else. Six months later their best engineers had quit or quietly broken policy: a senior who has shipped a hundred PRs through claude is not going back to inline‑suggestion autocomplete on principle.

The win is the middle. A short list — two or three sanctioned tools with explicit rationale — plus a real, lightweight path for an engineer to say “I want to try X for 30 days” and get a yes or a no in a week. That is what scores 3 points, and it is what the rest of this page operationalizes.

What “max score” actually looks like

A maxed‑out policy has six visible artifacts. If any are missing, you have not gotten there.

A written 1–2 page tooling policy in your handbook, linked from onboarding, that names every approved AI coding tool, which use cases each is preferred for, and who owns the vendor relationship.
A preferred stack of 2–3 tools, not 1 and not 7. Typical 2026 shape: Cursor as IDE primary for most ICs, Claude Code as terminal/agent primary for senior + platform work, and one acceptable inline assistant (Copilot Business or Codex) for repos where the first two are weaker.
An exception process that fits on one page — a Notion/Linear template with a problem statement, the requested tool, 30‑day pilot success criteria, data classification, and a named approver. SLA: 5 business days.
A pilot pipeline of 2–4 tools currently in 30‑day evaluation, with owners and end dates, so engineers see what’s already being looked at before filing a new exception.
A quarterly review where the platform / DevEx lead republishes the preferred stack, drops what’s been outgrown, promotes pilot winners, and re‑justifies the choices.
Bulk billing aligned with the policy. Everyone on Claude Code is on Anthropic Teams; everyone on Cursor is on Cursor Teams; nobody is expensing a personal subscription (see Q3 · Team billing).

The crude self‑test: ask three random engineers “what AI tools are we allowed to use, and how do I try a new one?” If you get three matching answers in under 30 seconds each, you are at 3 points. If you get shrugs or different answers, you are at 1 or 2.

Current landscape (web‑search‑verified)

Shadow AI is treated as distinct from Shadow IT because the blast radius is different. A rogue SaaS tool leaks the data you put into it; a rogue AI tool leaks plus modifies — it ingests code, suggests patches, and in 2026 runs commands and opens PRs. Three concrete losses every quarter you let it run unmanaged:

Audit blind spots. SOC 2, ISO 27001, EU AI Act, and HIPAA controls are scoped to known systems. If 30% of your code touches tools you can’t enumerate, audit findings get ugly fast. Several enterprises in 2026 now maintain an “AI Bill of Materials” — every model, vendor, dataset, tool — to close this gap.
Billing leak. A Cursor Pro seat on a personal card is $20/month and 0% reclaim from your Team plan discount. Ten engineers doing that is $2.4K/year missing from the CFO’s line item, plus lost volume that would qualify you for enterprise pricing (see Q3 · Team billing).
Knowledge silos. If two engineers get 5× speedup with Claude Code and the rest don’t know, that is the most expensive silo in your org. Sanctioned tools mean shared CLAUDE.md, skills repos, MCP servers, and onboarding scripts. Shadow tools mean none of that compounds.

Forced single‑tool failure mode

The reflexive response — “we’ll just standardize on Copilot because Procurement already signed the BAA” — has produced visible attrition spikes over the last two years. Three patterns recur:

The senior leaves. A staff engineer who has internalized claude code for 6 months will not happily return to inline autocomplete. They leave for an org where the tool they’re good at is sanctioned. This is now a top reason cited in exit interviews for AI‑native engineers.
The policy breaks in week 2. If your one approved tool isn’t the best for half the team’s actual work, engineers route around it. You now have Shadow AI and an enforcement headache.
You miss the next wave. The AI coding landscape has had a 90‑day half‑life since 2023. Cursor was niche in early 2024. Claude Code didn’t exist as an agent surface until mid‑2024. Codex relaunched in 2025. A single‑tool policy guarantees you’re 6–12 months behind the frontier.

The “preferred + alternative” pattern (e.g. Cursor primary, Claude Code for terminal‑heavy)

What most maxed‑score policies in 2026 converge on is a two‑tool default with a clear split by workload:

Cursor primary for IDE‑centric work. Frontend, design‑heavy, refactor‑in‑file, autocomplete‑heavy work. Visual diff, inline edits, Agent Mode for medium tasks. Default for most ICs.
Claude Code primary for terminal/agent‑heavy work. Backend, infra, multi‑file refactors, long‑running plans, MCP‑driven workflows, agents driving git and PRs. Default for platform / SRE / senior IC work, and for anyone whose flow is in the terminal.
Codex CLI as the second‑opinion model. Lower‑frequency use, but explicitly approved for diff review, stuck‑agent rescue, and GPT‑5.6 reasoning on hard refactors. Route Terra for balanced work and Sol for the hardest pass. Most teams keep one paid OpenAI seat per pod for this.
Copilot Business for languages where 1–3 underperform. Some shops keep Copilot as the inline autocomplete in Visual Studio for .NET, or in JetBrains for Kotlin/Java, because the inline experience there is still ahead.

The “preferred” doesn’t mean “mandatory”. An engineer can flip — Cursor primary today, Claude Code tomorrow — as long as both are on the sanctioned list. The list is what’s bounded; the choice is not.

Exception process template (1‑pager, 30‑day pilot, success criteria)

A good exception process is short enough that engineers actually use it. What’s working in 2026:

Form (≤1 page). Fields: tool, vendor, plan/tier, what the sanctioned tool can’t do, hypothesis (one sentence), data classification (prod code, customer data, PII?), pilot duration (default 30 days), success criteria (concrete: “I save 30 min/day on X” or “PRs from Y ship 2× faster”), proposed reviewer.
Approver. Named single person (your DevEx or AI tooling lead — see Q24 · AI tooling roadmap). Not a committee. SLA: 5 business days to yes / no / “talk to me”.
Pilot conditions. Sandbox repo or non‑prod data unless classification clears prod. License paid centrally during pilot. Engineer commits to a 15‑minute writeup at day 30 — what worked, what didn’t, add to sanctioned list yes/no.
Outcome. Day 30 writeup goes to platform. Either the tool joins the sanctioned list at the next quarterly review, or the exception expires and the engineer returns to the sanctioned stack. Either is fine; the worst case is no decision.

Copy-paste exception-request template (drop into Notion or a Linear issue; engineers fill it in, the named approver answers within 5 business days):

AI tool exception request

Tool / vendor / plan: e.g. Windsurf, Codeium, Pro

What our sanctioned stack can’t do here: the specific gap, one or two sentences

Hypothesis (one sentence): “Windsurf’s Cascade will cut my multi-file refactor time on the payments service in half”

Data classification it will touch: prod code / customer data / PII / none — pick one and justify

Pilot duration: 30 days (default)

Success criteria (measurable): “I save ≥30 min/day on X” or “PRs from Y ship 2× faster” — NOT “it feels good”

Pilot conditions: sandbox repo or non-prod data unless classification clears prod; license paid centrally

Proposed reviewer: a senior IC who will read the day-30 writeup

Day-30 commitment: 15-minute writeup — what worked, what didn’t, add to sanctioned list yes/no

The forcing function is the success criteria line. If the requester can’t state what success looks like before the pilot starts, the pilot will “pass” no matter what — and you are back to the seven-tool problem.

This is the lever that prevents the forced‑single‑tool failure mode. The signal to engineers: “We have opinions, and a path to change them.”

Step‑by‑step: writing your tooling policy

Inventory what’s actually in use right now. Run an anonymous survey: “Which AI coding tools have you used for this codebase in the last 30 days?” with a free‑text “other” field. Cross‑check against your SSO logs, your IDE telemetry (if you have it), and any expense reports labelled “subscription” or “developer tools”. You almost certainly have 2–4 more tools in use than you think.
Sort the inventory by usage volume and value. Plot the tools two ways: how many engineers are using each, and which workloads they’re winning on. The top two or three almost always emerge clearly. Anything in the long tail is a candidate for either consolidation onto the top picks or formal exception.
Draft the preferred stack with rationale. Write 2–4 sentences per sanctioned tool: which workloads it’s preferred for, who the executive sponsor is, what the contract / Team plan looks like, what data classification it’s cleared for. Keep the whole thing under two pages. Long policies don’t get read.
Write the exception form before you publish the policy. This is the part teams skip and then regret. If your policy says “exceptions allowed” but there’s no form, no approver, and no SLA, engineers won’t bother — they’ll just install the tool quietly. Put the form in Notion or Linear, name the approver, set the 5‑day SLA, and link it from the policy in two places minimum.
Move everyone to the sanctioned bulk plans. Migrate personal subscriptions to Team / Business / Enterprise plans for each sanctioned tool. Anthropic Teams, Cursor Teams, ChatGPT Business, Copilot Business — whichever apply. Eat the migration friction now; it’s the single biggest billing win on the scorecard (see Q3 · Team billing).
Publish, announce, and run two open Q&A sessions. Post the policy in the handbook and #engineering. Run a 30‑minute Q&A in the same week and again two weeks later. Field the “why isn’t tool X on the list?” questions live — most of them have good answers (“we evaluated, it was 3rd best”) and a couple will reveal genuine gaps.
Stand up a quarterly review. Calendar 60 minutes every 90 days with the DevEx lead and 2–3 senior ICs. Agenda: what pilots ran this quarter, what should be added / removed from the sanctioned stack, what’s coming next quarter, what the budget shift looks like. Publish the output as a delta on the original policy doc — don’t rewrite it.
Track pilot outcomes in one place. Spreadsheet or Linear project, doesn’t matter. Columns: tool, requester, start date, end date, decision, rationale. This becomes your defensible audit trail for both Procurement (“why are we buying this?”) and Security (“what data did it touch?”). It also stops you re‑evaluating the same rejected tool every six months.

Common pitfalls

No review cadence. A policy written once and never re‑opened is dead inside 90 days. The market moves; your policy has to move with it. Calendar the quarterly review before you publish the policy, not after.
No exception process. “We have a sanctioned stack and that’s it” is a single‑tool policy with extra steps. Engineers will route around it. The exception path is the safety valve that makes the standardization actually hold.
Banning what’s already working. If half your senior engineers are on Claude Code and shipping faster than ever, do not write a policy that pushes them onto Copilot because the BAA was easier. Make the BAA work for Claude Code — Anthropic offers Teams with the same controls — rather than picking the worst tool to be the only sanctioned one.
Listing seven sanctioned tools. Seven is the same as zero for a developer trying to figure out what to install. Keep it to two or three. Add via exception, never via “and also we sanction…”.
Approving by committee. Five people on the exception approver list means nothing gets approved in a week. Name one person, accountable, with a deputy for vacation coverage. The committee shows up at the quarterly review, not in the inbox.
Pilots with no success criteria. “We tried it, felt good” is not a pilot result. Force the requester to write down what success looks like before the pilot starts. Otherwise everything passes and you end up with the seven‑tool problem.
Policy in a doc nobody reads. Link it from onboarding, from the AI tooling channel header, from the expense form, and from your CLAUDE.md templates. Aim for “every new hire reads it in week 1” as a hard target.

How to verify you’re there

A new engineer can answer “what AI coding tools are we allowed to use, and how do I try a new one?” in under 30 seconds from your handbook.
Your SSO / billing / DLP reports cover ≥95% of the actual AI coding tool usage in the org. Spot‑check by surveying ICs anonymously.
At least one engineer has filed an exception in the last quarter and gotten a yes/no within a week.
At least one tool has either entered or exited the sanctioned stack in the last 12 months — the policy is alive, not frozen.
Your bulk Team / Business / Enterprise plans cover ≥90% of seats for each sanctioned tool. Personal subscriptions reclaimed or grandfathered with a known sunset date.
The quarterly review actually happens. You can produce the last two review notes on request.
Your DevEx or AI tooling lead has a named role and time allocation for this work, not just “extra duty as assigned”.
A senior engineer who has been on the team for two years says, unprompted, that the policy “feels reasonable” and “doesn’t get in the way”.