AI tooling roadmap — 6-12 month plan + dedicated lead

Scorecard question: Do you have a 6-12 month AI tooling roadmap? Max-score answer (3 pts): Roadmap + dedicated “AI tooling” lead/team with budget.

Why this matters in 2026

Without a roadmap, your organization’s AI position is whatever the loudest engineer wants on a Tuesday afternoon. One senior dev evangelises Claude Code, the next sprint a staff engineer pushes Cursor, a platform engineer quietly builds an MCP server on the side, finance asks why the Anthropic bill tripled, and security finds out about an internal model gateway only when a pen-tester points at it. None of that is malicious — it is the entirely predictable result of a market that ships breaking changes faster than any informal slack-and-vibes coordination can absorb. Claude Code went from launch to one of the most-used AI coding tools inside roughly a year. The MCP ecosystem went from “interesting Anthropic spec” to default integration surface in about twelve months. Strategic CTOs in 2026 either have an explicit, dated plan for which tools are coming in, which are being deprecated, who owns governance, and what the budget envelope looks like — or they have an explicit reason they don’t (the team is five people, or the org is mid-acquisition, or AI is genuinely not on the critical path this year).

The difference is observable. Teams with a roadmap can answer four questions in one breath: what are we rolling out this quarter, what are we deprecating, what are we experimenting with that may or may not promote, and who owns each of those decisions. Teams without a roadmap answer those questions with “well, it depends who you ask”. The same teams typically have three or four redundant AI coding tools (Cursor + Copilot + Windsurf seats sitting next to each other), no shared internal MCP catalogue, no rubric for promoting an experiment to a Tier-1 rollout, and an Anthropic bill that grew faster than headcount. The roadmap is not bureaucracy. It is the artefact that turns AI tooling from a hobby project into a budget line with an owner.

If you scored 0 or 1 on Q24, you almost certainly have one of three failure patterns: (a) a “roadmap” that is really a list of tools someone wants to try, with no dates, no budget, no exit criteria, and no owner; (b) an owner who is the CTO or the VP Eng “in their copious free time”, which in practice means nobody owns it; or (c) plenty of activity but no shared map — multiple smart engineers each running their own experiments on their own budgets without anyone aggregating what is working. Three points means you have a document, a named lead (or small team), a budget envelope they control, and a quarterly cadence to review what shipped, what got cut, and what the next horizon looks like.

What “max score” actually looks like

A real “max-score” answer here has two visible artefacts and one organizational reality behind them.

The first artefact is the roadmap itself — a single document (Notion, Linear initiative, Confluence page, the format does not matter) that any engineer in the org can find in under thirty seconds, with the following sections clearly laid out:

Now (this quarter). What is actively rolling out. Which models, which IDE seats, which MCP servers, which agent harnesses. Concrete owners and dates. This is execution.
Next (one to two quarters out). What is being prepared for rollout, with a planned promotion date and the criteria for “ready” (eval pass rate, security review complete, training delivered, budget approved). This is the queue.
Later (three to four quarters out). What is on the watchlist — promising tools, vendor categories, or capabilities the team is tracking but not yet committed to. This is the horizon.
Deprecating. Tools being wound down. End-of-life dates, replacement guidance, exception process for engineers who still need access. This is the cleanup.
Experiments. Time-boxed bets with explicit promotion or kill criteria. “Run Codex CLI alongside Claude Code for two weeks; promote if cost-per-merged-PR drops 15% or kill”. This is the funnel.
Budget envelope. The annualised number, broken down by category (seats, tokens, vendor contracts, headcount for the AI tooling lead/team, training, internal platform). This is what makes the roadmap a real thing rather than a wish list.

The second artefact is the AI tooling lead’s role definition — a one-page document naming the person (or small team), their headcount budget, their decision rights (what they can approve unilaterally, what needs the CTO, what needs the security team), and their KPIs. The role typically sits inside platform engineering and reports to the head of platform or directly to the CTO. The title varies — “AI Platform Engineer”, “Head of AI Tooling”, “Developer Experience Lead, AI” — but the contents are the same: own the roadmap, own the internal MCP catalogue, own the model gateway, own the eval harness, run the experiments, kill what does not work, communicate up to leadership and out to engineers.

The organizational reality behind both is that somebody’s job is to wake up Monday morning and think about AI tooling. Not in addition to their full-time job. As the job. That is the difference between “we have a Notion page” and “we have a strategy”. At ten engineers you can probably split this across two people 25% each. At fifty you need a full-time lead. At two hundred you need a small team — typically a lead, one or two platform engineers, and a fractional security partner. The headcount math is not gentle: a single senior platform engineer-for-AI in a Western market is $250-350K loaded, which feels expensive until you compare it to the cost of not having one (uncoordinated tool spend, security incidents, slow rollouts, brain drain to better-equipped competitors).

Current landscape (web-search-verified)

What goes on a 6-12 month roadmap (rollouts, deprecations, experiments)

The shape of a credible roadmap in 2026 is roughly five buckets, each with a concrete artefact:

Tier-1 rollouts (everyone uses these). The tools you have committed to org-wide. Typically: one IDE-attached AI (Cursor, Copilot, or Windsurf), one CLI-based coding agent (Claude Code, Codex, or a competitor), one chat surface (Claude or ChatGPT), one model gateway or routing layer (OpenRouter, internal LiteLLM, Portkey, Bedrock). The roadmap names the version, the rollout schedule, the training plan, and the success metric per tool.
Tier-2 selective rollouts (some teams use these). Tools that are paid for and supported but not universal — typically because they serve a specific workflow (Replit for prototyping, Devin or Factory for autonomous tasks, a vertical agent like Cognition for a specific repo). The roadmap names who has access, why, and the criteria for graduating to Tier-1 or being cut.
Experiments (timeboxed bets). Two- to six-week bets with a single named owner, a budget cap, a promotion criterion, and a kill criterion. Most experiments should kill cleanly — that is the point. Three out of four experiments failing is healthy; zero out of four is suspicious (you are not trying enough).
Internal platform investments. What you are building, not buying. Typically: internal MCP servers exposing your data sources and tools, a shared skills/agent rules library, an evals harness, an LLM gateway with logging and rate-limiting. These are platform-engineering projects with normal eng project management — not “we’ll get to it”.
Deprecations. Tools being wound down. End-of-life date, migration path, exception process. Every healthy roadmap has a deprecation column; if yours does not, you are accumulating tool debt.

The roadmap should also call out the explicit non-goals for the horizon — what you are not doing this year and why. “We are not building our own model” is a non-goal worth writing down. So is “we are not adopting autonomous coding agents this year” or “we are not buying Devin until eval results land”. Non-goals stop the loudest-engineer problem.

The AI tooling lead role (responsibilities, headcount math)

The 2026 AI Platform Engineer or AI tooling lead role has converged on a recognisable shape across published job specs from large enterprises and AI-tooling vendors (Augment Code’s public job-spec template is a representative example). The core responsibilities cluster into four groups:

Platform engineering. Build and operate the internal AI platform: a model gateway with logging and routing, an internal MCP server catalogue, a shared skills / agent-rules library, evals infrastructure, prompt management. These are real software engineering projects, not Notion pages. Strong Python or TypeScript is non-negotiable; experience with CI/CD, Kubernetes, and one of AWS/GCP/Azure is the floor.
MCP expertise. Build, host, and secure MCP servers connecting AI agents to internal data, APIs, and services. This is now the load-bearing integration surface — every engineer in the org will route through MCP servers the lead is responsible for, so security and reliability matter as much as features.
Governance and guardrails. Prompt firewalls, content filters, audit logging, red-team harnesses, vendor risk assessments. The EU AI Act high-risk obligations currently take effect 2 August 2026 — though a 2026 Digital Omnibus proposal may defer the stand-alone Annex III high-risk obligations to December 2027, so track its adoption before betting your roadmap on the August 2026 date. NIST AI RMF alignment is appearing in enterprise security reviews. The lead is the person who knows what the org’s exposure is and what evidence the auditor will want.
Rollout and enablement. Run the experiments, train the org, write the docs, kill the tools that do not work, promote the ones that do. This is half developer-relations, half product management — and it is the part most published job specs underweight.

Headcount math for a credible role:

5-15 engineers. A 25-50% allocation from a senior platform engineer is usually enough. The roadmap exists, the budget is small, the decisions are mostly “which seat do we buy” plus “which two experiments do we run this quarter”.
15-50 engineers. One full-time lead. They write the roadmap, run the gateway, own the MCP catalogue, and run two to four experiments per quarter. They report into the head of platform or directly into the CTO.
50-200 engineers. Lead plus one or two platform engineers, plus fractional security partner. The team owns the internal LLM gateway as a real platform service, runs internal evals against vendor models, maintains a real MCP catalogue with auth and rate-limiting, and runs a structured experiments programme.
200+ engineers. Small platform-for-AI team (4-6 people) with sub-specialisations: MCP infrastructure, evals/observability, governance/security partner, and a developer-experience role focused on enablement.

The single highest-leverage hire here is the first one — going from zero owner to one full-time owner. The next hires are diminishing-returns until you cross fifty engineers. Hire the first one too late and the cost shows up as redundant tool spend and security incidents; hire it too early and they have nothing to operate.

Quarterly review cadence

The roadmap is a living document, not a Q1 artefact that rots by Q3. The minimum cadence is:

Weekly inside the AI tooling team. Status on rollouts, experiments, deprecations. Same shape as any platform-engineering team weekly.
Monthly with engineering leadership. Thirty minutes. What shipped, what was killed, what is on the next quarter’s docket, what is blocked. Bring the cost numbers from Q4 · Cost visibility and the metrics from Q22 · AI metrics panel.
Quarterly review. Ninety minutes with the broader engineering org (or at least leads of every team). Reset the Now/Next/Later columns. Officially promote, demote, or kill anything in the experiment column. Update the budget envelope for the coming quarter. Publish the new doc and Slack-broadcast the changes so every engineer sees them.
Annually with finance and security. The full year review. Reconcile actual vs budgeted spend, vendor contracts up for renewal, security reviews due, training plan for the year, and headcount for the AI tooling team.

The act of running the cadence is the discipline. A roadmap that is reviewed quarterly stays useful; a roadmap that is “written once” is a wishlist by month four.

Example roadmap shape (Q1 baseline metrics, Q2 internal MCP, Q3 Tier 2, Q4 governance)

For a fifty-engineer org starting from a “scored 1, want to score 3” position, a plausible twelve-month roadmap looks like this:

Q1 — baseline and metrics. Hire the AI tooling lead. Stand up the metrics panel (Q22) so future decisions have data. Consolidate billing onto Team/Enterprise plans for the existing Tier-1 tools (Anthropic Team, Cursor Business, ChatGPT Business). Publish the v1 roadmap doc. First quarterly review at end of Q1 sets the actual numbers in the Now/Next/Later columns.
Q2 — internal MCP catalogue. Build the first three internal MCP servers exposing high-leverage internal data (the issue tracker, the design-system docs, the company knowledge base). Stand up a model gateway with logging. Roll out shared agent rules across the org (Q19 · Shared agent rules). Run the first structured experiment — typically a head-to-head of two coding agents — with documented promotion criteria.
Q3 — Tier-2 selective rollouts. Based on Q2 metrics, select one Tier-2 tool for a specific workflow (e.g. an autonomous-coding agent for a specific repo, a vertical agent for a specific domain). Pilot with one team for the quarter. Concurrently, retire the first deprecated tool (usually a personal-plan vendor superseded by the Team plan from Q1).
Q4 — governance and audit readiness. Complete the EU AI Act risk classification for the org’s use of AI tooling. Stand up the audit log pipeline. Run the first red-team / prompt-injection drill against the internal MCP servers. Negotiate next year’s vendor contracts based on actual usage data. Publish the v2 roadmap for the following year.

The specific dates and tools change company by company. The shape — baseline → platform → selective rollout → governance — is robust. Teams that try to do the Q4 work in Q1 usually flounder because they have no metrics to make decisions with; teams that defer governance past Q4 usually get caught by an audit they did not see coming.

Step-by-step: building a roadmap from scratch

Name the owner. Before any document, decide who is accountable. If you cannot name a single person whose job description includes “owns the AI tooling roadmap” by the end of next week, that is the only step that matters. Allocate at least 50% of their time. The CTO is not the owner of the operational roadmap (the CTO is the sponsor); the owner is someone whose primary day-job is platform engineering or developer experience.
Inventory the current state. List every AI tool currently in use, whether centrally billed or expensed personally. Cross-reference with the team-billing audit from Q3 · Team billing. Include shadow tools — the trial accounts, the personal-plan logins, the half-built MCP servers running on someone’s laptop. The first pass should produce surprises; budget two days for “wait, we pay for what?”.
Pick three Tier-1 tools and commit. Choose one IDE-attached AI, one CLI agent, one chat surface. Document the choice and the reasoning. This is the single most clarifying act of the whole exercise — most teams discover they have been paying for four IDEs and three CLI agents simultaneously. Communicate clearly: from date X, these three are supported; everything else is exception-only. Pair this with Q2 · Tooling policy.
Draft the Now/Next/Later columns. With the owner, in one ninety-minute session, draft the v0 roadmap. Now = what is rolling out this quarter. Next = the queue for next two quarters. Later = the watchlist. Be ruthless about “Later” — if it has no defined “what would move this to Next” criterion, it does not belong on the roadmap at all.
Define the experiment funnel. Pick the first two to four experiments. For each, write: hypothesis, owner, budget cap, two-to-six-week timebox, promotion criterion, kill criterion. Add them to the roadmap. The experiment doc is the highest-leverage artefact in the whole roadmap — it is what stops “let’s just try X” from accumulating as ungoverned tool debt.
Set the budget envelope. Annualised. Broken into seats, tokens, vendor contracts, internal-platform headcount, training, and a contingency line. Get it signed off by finance. The lead controls anything inside the envelope; anything outside needs CTO sign-off. This is what makes the roadmap a real artefact: it has a number.
Build the metrics scaffolding. You cannot make data-driven roadmap decisions without data. Stand up the basics: cost per merged PR (Q4), adoption rate by team (Q1 · Team adoption rate), and a couple of throughput signals from Q22 · AI metrics panel. The metrics do not need to be perfect to be useful; they need to be visible and trended.
Publish and broadcast. Put the document somewhere every engineer can find it. Slack-announce the new roadmap. Hold a thirty-minute all-hands explaining the Now/Next/Later, the Tier-1 commitments, the experiment funnel, and where to file requests. Repeat the broadcast every quarter when the doc updates. Silent roadmaps die.
Set the cadence in the calendar. Recurring weekly inside the team, monthly with engineering leadership, quarterly with the broader org, annually with finance/security. Put them in the calendar in week one. Cadence that exists in someone’s head dies the first sprint it conflicts with a deadline.
Plan the first deprecation. Pick one tool to retire this quarter. Communicate the EOL date six weeks in advance. Provide migration guidance. Track usage drop-off; on the date, revoke seats. The first deprecation is harder than it looks — there is always one engineer who insists they cannot live without it — but it sets the precedent that this roadmap actually deprecates things. Without that precedent, the roadmap becomes a wishlist.
Review and rewrite at the end of Q1. The v0 roadmap will be wrong in places. That is fine. At the end of the first quarter, sit with the owner, look at what shipped, what slipped, what got cut, and what new arrivals (vendor releases, model launches, security advisories) need to be slotted in. Publish v1. The roadmap is now a real living document.

Common pitfalls

Roadmap as wishlist. A bullet list of “tools we want to try” with no dates, no owners, no budget, and no kill criteria. This is the most common failure mode. The fix is to delete anything that does not have an owner and a date attached, even if it leaves the doc looking embarrassingly short. A short, real roadmap is a roadmap. A long, aspirational one is theatre.
No owner, or a fake owner. “The CTO owns it” or “the VP Eng owns it” usually means nobody owns it, because the CTO is full-time on twenty other things. The owner has to be someone whose week, not just whose title, includes the work. If you cannot point at a calendar showing the owner spending real hours per week on the roadmap, the roadmap is unowned.
No quarterly review. The doc is written in February, referenced once in March, and forgotten by May. Six months later the org has drifted to entirely new tools and the doc is a fossil. The cadence is the discipline; without it, the document is fiction.
No exit criteria for experiments. Pilots that started in Q1 are still running in Q4, neither promoted nor killed. Every experiment must have a date and a clear-eyed kill criterion at the moment it starts. “We will run Devin on repo X for six weeks; if cost-per-merged-PR is not 20% lower with quality equal, we kill it” is a good experiment definition. “We’re trying Devin” is not.
Budget envelope held by finance, not the lead. If the lead has to file a ticket for every $200 seat, they cannot move at the pace the market demands. The whole point of the envelope is to give the lead authority inside it. Without that, the role is decorative.
Confusing the roadmap with the procurement queue. The roadmap names which tools are committed to, deprecated, or being experimented with. It is not a vendor leaderboard. A new shiny tool launching does not automatically belong on the roadmap; it belongs on the watchlist until someone runs a structured experiment.
Skipping deprecations. Adding-only roadmaps are how orgs end up with eight overlapping AI tools and a bill that bears no relation to value delivered. Every roadmap should retire at least one tool per quarter, even if that means retiring something that “kind of worked”.
No connection to security and finance. The roadmap that does not show up in the annual security review and the annual budget review is a parallel universe document. The audit log pipeline, the data residency commitments, the headcount line — all of those need to live in the roadmap and be visible to security and finance, not just engineering.

How to verify you’re there

A document exists, every engineer in the org can find it in under thirty seconds, and it has been updated within the last ninety days.
A single named human (or small team) is accountable for the roadmap, has at least half their week officially allocated to it, and can describe the current Now/Next/Later columns from memory.
The budget envelope for AI tooling is a real number, signed off by finance, broken down by seats / tokens / contracts / headcount / training, and the lead has authority inside it.
The roadmap contains at least one explicit deprecation with an end-of-life date in the next quarter.
At least one experiment running this quarter has a written kill criterion that everyone involved agrees would actually kill it.
The internal MCP server catalogue (see Q14 · Internal MCP servers) is on the roadmap as a tracked platform investment, not a side project.
The cadence is in the calendar (weekly inside the team, monthly with leadership, quarterly with the broader org, annually with finance/security) and the meetings actually happen.
The last quarterly review produced concrete decisions — promote A, kill B, defer C — and those decisions are reflected in the current state of the doc.
A new engineer joining tomorrow reads the roadmap in ten minutes and knows exactly what they should and should not be installing on their machine.
If asked by the board “what is our AI tooling strategy for the next year”, the CTO points at the document, not at a deck reverse-engineered the night before.