Success Stories & Case Studies

Your CTO read a blog post about “10x with AI” and now wants a migration plan by Friday. The problem: every case study you find is a vendor testimonial with a suspiciously round number and no reproducible workflow attached. You can’t take “$2M saved” into a planning meeting — you need to know what the team actually typed into the agent.

This article walks through the one publicly documented, verifiable set of internal Claude Code rollouts (Anthropic’s own teams), pulls out the patterns that made them work, and turns each into a prompt you can run on day one. Where the workflow differs across Cursor, Claude Code, and Codex, we show all three.

What You’ll Walk Away With

A clear-eyed read of the one fully documented internal AI-coding rollout, with a citation you can paste into a planning doc
Three copy-paste prompts that reproduce the patterns those teams used: codebase onboarding, batch generation against a data file, and building a throwaway notebook into a persistent tool
A simple, honest ROI model you can fill in with your numbers — not invented ones
A “When This Breaks” list of the failure modes that quietly kill pilots

The Documented Rollout: Anthropic’s Internal Teams

Anthropic published how its own non-product teams use Claude Code. These are useful precisely because they describe the interaction, not just the outcome.

Data Infrastructure: debugging from a screenshot

During a production incident, the Data Infrastructure team pasted a Kubernetes dashboard screenshot into Claude Code. The agent diagnosed pod IP address exhaustion and produced the exact commands to create a new IP pool — saving roughly 20 minutes mid-outage. The reusable lesson isn’t “AI saves 20 minutes”; it’s that a multimodal agent can turn an opaque dashboard into a runbook step, so on-call engineers without deep networking expertise can act.

Paste this into Claude Code (or Cursor’s agent with an image attached) alongside a screenshot of the failing dashboard:

Here's a screenshot of our Kubernetes cluster dashboard during an active
incident. Pods are stuck Pending. Walk through the most likely root causes
in priority order. For the top candidate, give me the exact kubectl commands
to confirm it and the exact commands to remediate. Assume EKS, calico CNI,
and that I have cluster-admin. Do not run anything — output the commands and
tell me what each one verifies before I run it.

Growth Marketing: a Figma plugin generating 100 ad variations

A single-person performance marketing function built an agentic workflow that ingests a CSV of ad performance and a Figma plugin that generates up to 100 ad variations by swapping headlines and descriptions — collapsing hours of manual variant production. The transferable pattern is “loop the agent over a data file to produce structured output,” which is identical whether the output is ad copy, test fixtures, or migration stubs.

Point your agent at an actual file in the repo rather than describing it:

Read ./data/products.csv. For each row, generate a Jest test that asserts the
pricing function in src/pricing.ts returns the expected total for that row's
quantity and tier. Use describe/it blocks named after the SKU. Write them to
tests/pricing.generated.test.ts. After writing, run the suite and fix any
test that fails because my expected value in the CSV was wrong — flag those
rows in a comment at the top of the file so I can verify them.

Data Science: from throwaway notebook to a tool that lasts

Data scientists used Claude Code to build complex visualization dashboards without writing JavaScript themselves, shifting from disposable notebooks to persistent, shareable TypeScript apps. The pattern: describe the thing you keep re-deriving by hand, and let the agent stand up a small app around it.

I have an analysis I re-run by hand every week (pasted below). Turn it into a
small persistent tool: a single-file React + Vite app that loads the CSV from
a file input, renders the same three charts I describe, and lets me filter by
date range. Use Recharts. Keep it to one component plus a data-loading hook.
Explain any data-shape assumptions you had to make so I can correct them.

[paste your current notebook / pandas script here]

The Same Patterns Across All Three Tools

The three reusable moves above — diagnose-from-context, batch-against-a-file, build-a-persistent-tool — are tool-agnostic. What differs is how you point the agent at your project. Here’s the day-one onboarding step in each tool.

Open the repo in Cursor and use Agent mode (the agent reads the workspace and .cursor/rules automatically). To capture conventions so every future session starts grounded, generate a rules file:

Read the project structure, package.json, and the three largest source
files. Draft a .cursor/rules/project.mdc that captures our stack, our
directory layout, our test command, and the two or three conventions a new
contributor would get wrong. Keep it under 40 lines.

Review the diff in Cursor’s inline view before accepting — rules files drift, so keep them short and reviewed.

From the repo root, run claude and let it inventory the codebase, then commit the context file it generates:

Explore this repository and write a CLAUDE.md at the root. Capture: the
stack and runtime, how to install and run tests, the build/deploy command,
and the conventions a new engineer most often gets wrong here. Keep it tight
— bullet points, no prose padding.

For a large monolith, ask Claude Code to use sub-agents so it can read more of the tree in parallel without blowing the context window. Run headless in CI later with claude -p --output-format json once the workflow is proven.

Codex reads an AGENTS.md for project context. In the Codex CLI (or the IDE extension), run:

Read this repo and write an AGENTS.md at the root describing the stack, the
test and build commands, and the conventions a new contributor would miss.
Then list three independent tasks from our backlog that are safe to run in
parallel as separate Codex Cloud tasks.

The parallel-task suggestion plays to Codex’s strength: fan those out as async Cloud tasks and review the diffs as they land, rather than babysitting one session.

A Realistic ROI Model (Fill In Your Own Numbers)

You will be asked to justify the spend. Resist the urge to borrow someone else’s headline figure. Instead, build the smallest defensible model and measure against it. The table below is an illustrative template — every input is a placeholder for a number you measure on your own team, not a reported result.

Illustrative 12-month model — replace every figure with measured data

Input	Where to get it	Example placeholder
Loaded hourly cost per developer	Finance / fully-loaded comp	your number
Hours/week on tasks you’ll delegate	One week of time-tracking	your number
Realistic time saved on those tasks	Your 2-week pilot, measured	conservatively, your number
Tool subscription per seat / month	Vendor pricing page	your number
Onboarding/training hours per dev	Your rollout plan	your number

How to Run a Pilot That Produces Real Evidence

Pick one painful, measurable task. Not “make us faster” — something like “time from PR opened to first review comment” or “hours to add a CRUD endpoint with tests.” Record a baseline for two weeks before introducing the tool.
Run one of the prompts above on that exact task with two or three volunteers. Keep the prompt in a shared file so everyone runs the same thing and you can compare.
Capture before/after on the one metric you chose. A spreadsheet with dates, the task, and the measured number beats any vendor case study.
Write down the failure modes you hit (see below) and how you recovered. This is the most valuable artifact — it’s what makes the rollout repeatable instead of a fluke.

When This Breaks

The agent confidently produces plausible-but-wrong commands. The dashboard-debugging story works because a human verified each kubectl step before running it. Never let an agent execute infra changes unattended during an incident — have it propose, you verify. Your prompts should explicitly say “do not run anything.”
Batch generation drifts from your real schema. When you loop over a CSV or data file, the agent will quietly invent column names or assume types. Always ask it to flag the assumptions it made (the prompts above do) and spot-check a few generated outputs against the source.
The “persistent tool” becomes unmaintained shadow IT. The notebook-to-app pattern is powerful and dangerous — you can end up with five one-off React apps nobody owns. Decide upfront whether the tool graduates into a real repo with tests or stays a personal script.
Your ROI number doesn’t survive the first skeptic. If you can’t answer “measured how, over how many samples?”, you don’t have a result yet — you have a vibe. Go back to a smaller, measured claim.
Context files rot. A CLAUDE.md, AGENTS.md, or .cursor/rules that drifts from reality makes the agent worse, because it now confidently follows stale conventions. Treat them as code: review changes, keep them short, delete what’s no longer true.

What’s Next

Team Migration Strategies

Turn a successful pilot into an organization-wide rollout with phased adoption and champions.

Read the playbook →

Project Conversion Playbook

The technical steps to make an existing repo AI-ready: context files, rules, and CI integration.

Convert a project →

Workflow Transformation

Reimagine delegation and review loops so the productivity gains compound instead of fizzling.

Transform workflows →

Migration Strategy Center

Assessment frameworks, tool selection, and every migration path in one place.

Start here →