Performance Optimization with Codex

Your API’s p95 latency just crossed 2 seconds and the monitoring dashboard is all red. The product team says “fix the performance,” but the slow requests touch five services, three databases, and a caching layer. Profiling locally shows different bottlenecks than production because the data volume is one-thousandth of real traffic. You need to find the actual bottleneck, optimize it, and prove the fix works before deploying. Codex can profile, optimize, and benchmark — and with cloud tasks and best-of-N attempts, it can explore multiple optimization strategies in parallel.

What You’ll Walk Away With

Prompts for identifying and profiling performance bottlenecks across the stack
A cloud task workflow using --attempts for parallel optimization exploration
Techniques for benchmarking optimizations with reproducible results
An automation recipe for weekly performance regression detection

The Workflow

Step 1: Identify the Bottleneck

Start in the CLI with a diagnostic prompt. Give Codex the symptoms and let it investigate.

Copy-paste prompt for performance diagnosis:

The GET /api/dashboard endpoint has p95 latency of 2.1 seconds. Acceptable is under 500ms.

Investigate the performance bottleneck:

1. Read the route handler and trace every function call in the request path
2. Identify database queries and estimate their complexity (check for missing indexes, N+1 patterns, full table scans)
3. Check for synchronous operations that could be parallelized
4. Look for unnecessary data fetching (loading full objects when only IDs are needed)
5. Check the caching layer -- is it being used? Is the cache hit rate likely to be low?

Report the top 3 most likely bottlenecks with estimated impact and file paths.
Do NOT fix anything yet. Diagnosis only.

Step 2: Profile with Cloud Tasks

For accurate profiling, use a cloud environment with production-scale test data. Cloud tasks can install profiling tools, generate load, and report results.

codex cloud exec --env perf-test "Profile the GET /api/dashboard endpoint:

1. Seed the database with 100K users and 12 months of order data using scripts/seed-reports.ts
2. Start the application server
3. Use autocannon to send 100 requests to GET /api/dashboard with a valid auth token
4. Enable Node.js --prof for CPU profiling during the load test
5. Process the profile: node --prof-process isolate-*.log > profile.txt
6. Analyze the profile output and identify:
   - Functions consuming the most CPU time
   - Database query execution times
   - Any blocking operations in the event loop

Report the findings with specific function names, file paths, and millisecond breakdowns."

Step 3: Explore Optimizations with Best-of-N

The best part of cloud tasks for performance work is the --attempts flag. Codex generates multiple independent solutions and you pick the best one. Each attempt explores a different optimization strategy.

codex cloud exec --env perf-test --attempts 3 "The GET /api/dashboard endpoint is slow because of:
1. N+1 query pattern in the order summary aggregation
2. Missing database index on orders.user_id + orders.created_at
3. No caching for data that changes at most once per day

Optimize the endpoint to achieve p95 latency under 500ms. You may:
- Rewrite queries to eliminate N+1 patterns
- Add database indexes
- Implement a caching layer with appropriate TTL
- Parallelize independent data fetches with Promise.all

After optimization, benchmark with autocannon (100 concurrent connections, 10 seconds) and report the before/after latency comparison.

Run the full test suite after optimization to verify no regressions."

With three attempts, Codex might try three different strategies: one focusing on query optimization, one on caching, and one on a combination. Compare the benchmark results across attempts and pick the approach that gives the best improvement with the least complexity.

Step 4: Benchmark and Validate Locally

After selecting the best optimization from the cloud attempts, apply it locally and run your own benchmarks:

In a worktree thread:

Apply the query optimization approach from the cloud task. Specifically:

1. Replace the N+1 query in src/services/dashboard.ts with a single aggregation query
2. Add the composite index on orders(user_id, created_at)
3. Add a 5-minute cache with Redis for the dashboard summary data
4. Parallelize the three independent data fetches with Promise.all

After implementation:
- Run the test suite to verify correctness
- Use the integrated terminal to run: npm run benchmark -- --endpoint /api/dashboard
- Report the latency improvement

Step 5: Automate Performance Regression Detection

Set up a weekly automation to catch performance regressions before they reach production:

Copy-paste automation prompt for performance monitoring:

Weekly performance regression check:

1. Start the dev server with the test database
2. Run benchmarks against these critical endpoints:
   - GET /api/dashboard (target: p95 < 500ms)
   - POST /api/orders (target: p95 < 200ms)
   - GET /api/users/:id (target: p95 < 100ms)
3. Compare results to the baseline in perf-baseline.json
4. If any endpoint's p95 latency increased by more than 20% from baseline:
   - Identify which recent commits touched the relevant code paths
   - Analyze the changes for performance impact
   - Report the regression with root cause analysis

If performance is stable or improved, update the baseline file.

When This Breaks

Local profiling does not match production. The biggest trap in performance work. Your local database has 1,000 rows; production has 50 million. A query that scans 1,000 rows in 5ms will take 250 seconds scanning 50 million. Always test with production-scale data in cloud environments. Include “seed the database with at least 100K rows before benchmarking” in your prompts.

Caching fixes the symptom but introduces stale data bugs. Adding a cache improves latency but creates consistency issues. Tell Codex: “If you add caching, also implement cache invalidation for the specific scenarios that change the underlying data. Include tests that verify cache invalidation works.”

Best-of-N attempts optimize different things. With three attempts, you might get three valid but incompatible optimizations. They cannot all be merged. Pick one approach, or ask Codex to combine the best elements: “Take the query optimization from attempt 1, the caching strategy from attempt 3, and combine them into a single implementation.”

Benchmark results are inconsistent. If the dev machine is doing other work, benchmark numbers are noisy. Include warmup iterations and multiple runs in your benchmark script. Tell Codex: “Run the benchmark 5 times and report the median p95, not a single run.”

What’s Next

Database Operations Optimize the database layer that is often the root cause of performance issues

CI/CD with Codex GitHub Action Add performance benchmarks to your CI pipeline to catch regressions automatically