Skip to content

Performance Optimization with Codex

Your API’s p95 latency just crossed 2 seconds and the monitoring dashboard is all red. The product team says “fix the performance,” but the slow requests touch five services, three databases, and a caching layer. Profiling locally shows different bottlenecks than production because the data volume is one-thousandth of real traffic. You need to find the actual bottleneck, optimize it, and prove the fix works before deploying. Codex can profile, optimize, and benchmark — and with cloud tasks and best-of-N attempts, it can explore multiple optimization strategies in parallel.

  • Prompts for identifying and profiling performance bottlenecks across the stack
  • A cloud task workflow using --attempts for parallel optimization exploration
  • Techniques for benchmarking optimizations with reproducible results
  • An automation recipe for weekly performance regression detection

Start in the CLI with a diagnostic prompt. Give Codex the symptoms and let it investigate.

For accurate profiling, use a cloud environment with production-scale test data. Cloud tasks can install profiling tools, generate load, and report results.

Terminal window
codex cloud exec --env perf-test "Profile the GET /api/dashboard endpoint:
1. Seed the database with 100K users and 12 months of order data using scripts/seed-reports.ts
2. Start the application server
3. Use autocannon to send 100 requests to GET /api/dashboard with a valid auth token
4. Enable Node.js --prof for CPU profiling during the load test
5. Process the profile: node --prof-process isolate-*.log > profile.txt
6. Analyze the profile output and identify:
- Functions consuming the most CPU time
- Database query execution times
- Any blocking operations in the event loop
Report the findings with specific function names, file paths, and millisecond breakdowns."

Step 3: Explore Optimizations with Best-of-N

Section titled “Step 3: Explore Optimizations with Best-of-N”

The best part of cloud tasks for performance work is the --attempts flag. Codex generates multiple independent solutions and you pick the best one. Each attempt explores a different optimization strategy.

Terminal window
codex cloud exec --env perf-test --attempts 3 "The GET /api/dashboard endpoint is slow because of:
1. N+1 query pattern in the order summary aggregation
2. Missing database index on orders.user_id + orders.created_at
3. No caching for data that changes at most once per day
Optimize the endpoint to achieve p95 latency under 500ms. You may:
- Rewrite queries to eliminate N+1 patterns
- Add database indexes
- Implement a caching layer with appropriate TTL
- Parallelize independent data fetches with Promise.all
After optimization, benchmark with autocannon (100 concurrent connections, 10 seconds) and report the before/after latency comparison.
Run the full test suite after optimization to verify no regressions."

With three attempts, Codex might try three different strategies: one focusing on query optimization, one on caching, and one on a combination. Compare the benchmark results across attempts and pick the approach that gives the best improvement with the least complexity.

After selecting the best optimization from the cloud attempts, apply it locally and run your own benchmarks:

In a worktree thread:

Apply the query optimization approach from the cloud task. Specifically:
1. Replace the N+1 query in src/services/dashboard.ts with a single aggregation query
2. Add the composite index on orders(user_id, created_at)
3. Add a 5-minute cache with Redis for the dashboard summary data
4. Parallelize the three independent data fetches with Promise.all
After implementation:
- Run the test suite to verify correctness
- Use the integrated terminal to run: npm run benchmark -- --endpoint /api/dashboard
- Report the latency improvement

Step 5: Automate Performance Regression Detection

Section titled “Step 5: Automate Performance Regression Detection”

Set up a weekly automation to catch performance regressions before they reach production:

Local profiling does not match production. The biggest trap in performance work. Your local database has 1,000 rows; production has 50 million. A query that scans 1,000 rows in 5ms will take 250 seconds scanning 50 million. Always test with production-scale data in cloud environments. Include “seed the database with at least 100K rows before benchmarking” in your prompts.

Caching fixes the symptom but introduces stale data bugs. Adding a cache improves latency but creates consistency issues. Tell Codex: “If you add caching, also implement cache invalidation for the specific scenarios that change the underlying data. Include tests that verify cache invalidation works.”

Best-of-N attempts optimize different things. With three attempts, you might get three valid but incompatible optimizations. They cannot all be merged. Pick one approach, or ask Codex to combine the best elements: “Take the query optimization from attempt 1, the caching strategy from attempt 3, and combine them into a single implementation.”

Benchmark results are inconsistent. If the dev machine is doing other work, benchmark numbers are noisy. Include warmup iterations and multiple runs in your benchmark script. Tell Codex: “Run the benchmark 5 times and report the median p95, not a single run.”