Your CTO read a blog post about “10x with AI” and now wants a migration plan by Friday. The problem: every case study you find is a vendor testimonial with a suspiciously round number and no reproducible workflow attached. You can’t take “$2M saved” into a planning meeting — you need to know what the team actually typed into the agent.
This article walks through the one publicly documented, verifiable set of internal Claude Code rollouts (Anthropic’s own teams), pulls out the patterns that made them work, and turns each into a prompt you can run on day one. Where the workflow differs across Cursor, Claude Code, and Codex, we show all three.
A clear-eyed read of the one fully documented internal AI-coding rollout, with a citation you can paste into a planning doc
Three copy-paste prompts that reproduce the patterns those teams used: codebase onboarding, batch generation against a data file, and building a throwaway notebook into a persistent tool
A simple, honest ROI model you can fill in with your numbers — not invented ones
A “When This Breaks” list of the failure modes that quietly kill pilots
The Documented Rollout: Anthropic’s Internal Teams
Anthropic published how its own non-product teams use Claude Code. These are useful precisely because they describe the interaction, not just the outcome.
During a production incident, the Data Infrastructure team pasted a Kubernetes dashboard screenshot into Claude Code. The agent diagnosed pod IP address exhaustion and produced the exact commands to create a new IP pool — saving roughly 20 minutes mid-outage. The reusable lesson isn’t “AI saves 20 minutes”; it’s that a multimodal agent can turn an opaque dashboard into a runbook step, so on-call engineers without deep networking expertise can act.
Growth Marketing: a Figma plugin generating 100 ad variations
A single-person performance marketing function built an agentic workflow that ingests a CSV of ad performance and a Figma plugin that generates up to 100 ad variations by swapping headlines and descriptions — collapsing hours of manual variant production. The transferable pattern is “loop the agent over a data file to produce structured output,” which is identical whether the output is ad copy, test fixtures, or migration stubs.
Data Science: from throwaway notebook to a tool that lasts
Data scientists used Claude Code to build complex visualization dashboards without writing JavaScript themselves, shifting from disposable notebooks to persistent, shareable TypeScript apps. The pattern: describe the thing you keep re-deriving by hand, and let the agent stand up a small app around it.
The three reusable moves above — diagnose-from-context, batch-against-a-file, build-a-persistent-tool — are tool-agnostic. What differs is how you point the agent at your project. Here’s the day-one onboarding step in each tool.
Open the repo in Cursor and use Agent mode (the agent reads the workspace and .cursor/rules automatically). To capture conventions so every future session starts grounded, generate a rules file:
Read the project structure, package.json, and the three largest source
files. Draft a .cursor/rules/project.mdc that captures our stack, our
directory layout, our test command, and the two or three conventions a new
contributor would get wrong. Keep it under 40 lines.
Review the diff in Cursor’s inline view before accepting — rules files drift, so keep them short and reviewed.
From the repo root, run claude and let it inventory the codebase, then commit the context file it generates:
Explore this repository and write a CLAUDE.md at the root. Capture: the
stack and runtime, how to install and run tests, the build/deploy command,
and the conventions a new engineer most often gets wrong here. Keep it tight
— bullet points, no prose padding.
For a large monolith, ask Claude Code to use sub-agents so it can read more of the tree in parallel without blowing the context window. Run headless in CI later with claude -p --output-format json once the workflow is proven.
Codex reads an AGENTS.md for project context. In the Codex CLI (or the IDE extension), run:
Read this repo and write an AGENTS.md at the root describing the stack, the
test and build commands, and the conventions a new contributor would miss.
Then list three independent tasks from our backlog that are safe to run in
parallel as separate Codex Cloud tasks.
The parallel-task suggestion plays to Codex’s strength: fan those out as async Cloud tasks and review the diffs as they land, rather than babysitting one session.
You will be asked to justify the spend. Resist the urge to borrow someone else’s headline figure. Instead, build the smallest defensible model and measure against it. The table below is an illustrative template — every input is a placeholder for a number you measure on your own team, not a reported result.
Illustrative 12-month model — replace every figure with measured data
Pick one painful, measurable task. Not “make us faster” — something like “time from PR opened to first review comment” or “hours to add a CRUD endpoint with tests.” Record a baseline for two weeks before introducing the tool.
Run one of the prompts above on that exact task with two or three volunteers. Keep the prompt in a shared file so everyone runs the same thing and you can compare.
Capture before/after on the one metric you chose. A spreadsheet with dates, the task, and the measured number beats any vendor case study.
Write down the failure modes you hit (see below) and how you recovered. This is the most valuable artifact — it’s what makes the rollout repeatable instead of a fluke.
The agent confidently produces plausible-but-wrong commands. The dashboard-debugging story works because a human verified each kubectl step before running it. Never let an agent execute infra changes unattended during an incident — have it propose, you verify. Your prompts should explicitly say “do not run anything.”
Batch generation drifts from your real schema. When you loop over a CSV or data file, the agent will quietly invent column names or assume types. Always ask it to flag the assumptions it made (the prompts above do) and spot-check a few generated outputs against the source.
The “persistent tool” becomes unmaintained shadow IT. The notebook-to-app pattern is powerful and dangerous — you can end up with five one-off React apps nobody owns. Decide upfront whether the tool graduates into a real repo with tests or stays a personal script.
Your ROI number doesn’t survive the first skeptic. If you can’t answer “measured how, over how many samples?”, you don’t have a result yet — you have a vibe. Go back to a smaller, measured claim.
Context files rot. A CLAUDE.md, AGENTS.md, or .cursor/rules that drifts from reality makes the agent worse, because it now confidently follows stale conventions. Treat them as code: review changes, keep them short, delete what’s no longer true.