Load, Stress, and Benchmark Testing
Your API handles 500 requests per second in staging and everyone celebrates. Then Black Friday hits, traffic spikes to 3,000 rps, and the database connection pool exhausts within minutes. The response time graph looks like a hockey stick and your CEO is watching the uptime dashboard. Performance testing is not optional for production systems — and AI makes building comprehensive performance test suites dramatically easier.
What You’ll Walk Away With
Section titled “What You’ll Walk Away With”- k6 and Artillery load test generation from AI prompts
- Stress testing patterns that find your system’s breaking point safely
- Continuous benchmarking in CI that catches performance regressions
- AI-assisted analysis of performance bottlenecks from test results
- Realistic traffic pattern simulation for your specific use case
Load Test Generation
Section titled “Load Test Generation”Generate a k6 load test for our checkout API:
Scenario: Simulate a flash sale with ramping traffic- Ramp from 0 to 100 virtual users over 2 minutes- Hold at 100 VUs for 5 minutes (steady state)- Spike to 500 VUs for 1 minute (flash sale moment)- Return to 100 VUs for 2 minutes (recovery)- Ramp down to 0 over 1 minute
API calls per virtual user iteration:1. POST /api/auth/login (use test credentials from env)2. GET /api/products?category=sale (browse sale items)3. POST /api/cart/items (add random product)4. POST /api/checkout (complete purchase with test payment)
Thresholds:- p95 response time < 500ms during steady state- p99 response time < 2000ms during spike- Error rate < 1% at all times- Checkout success rate > 99%
Save to /tests/performance/checkout-load.k6.jsclaude "Create a comprehensive k6 performance test suite:
1. /tests/performance/checkout-load.k6.js - Checkout flow load test - Ramping traffic pattern: 0 -> 100 -> 500 -> 100 -> 0 VUs - Realistic user journey (login, browse, cart, checkout) - SLA thresholds for response time and error rate
2. /tests/performance/api-stress.k6.js - API endpoint stress test - Test each critical endpoint individually - Find the breaking point (ramp until errors > 5%) - Report max throughput per endpoint
3. /tests/performance/helpers/auth.js - Shared auth helper - Login and cache tokens - Token refresh handling
4. package.json scripts: - test:perf:load - Run load tests - test:perf:stress - Run stress tests - test:perf:smoke - Quick 30-second smoke test
Include realistic test data generation for each scenario."Create a performance testing suite for this project:1. Analyze the API routes to identify critical endpoints2. Generate k6 load tests for the top 5 most important flows3. Create stress tests that find breaking points4. Add performance smoke tests for CI integration5. Create a PR with the test suite and documentation
Include realistic traffic patterns based on typical SaaS usage.Stress Testing: Finding the Breaking Point
Section titled “Stress Testing: Finding the Breaking Point”Continuous Benchmarking in CI
Section titled “Continuous Benchmarking in CI”Catching Performance Regressions Automatically
Section titled “Catching Performance Regressions Automatically”Load and stress runs are long and CPU-hungry, so where you run them matters as much as the script. Each tool has a natural home for k6 jobs:
Author and debug the k6 scripts locally in Agent mode against a staging URL, then commit the workflow file. Cursor is where you iterate on thresholds and scenarios; you do not want long stress runs blocking the editor, so keep the in-editor runs to the 30-second smoke test.
Run the smoke benchmark headlessly as a PR gate: claude -p "run k6 run tests/performance/smoke.k6.js, compare p95 against perf-baseline.json, and fail if it regressed more than 20%" inside the GitHub Actions job. Pair it with a PostToolUse hook so the comparison comment is posted automatically.
Offload the long load and stress runs to Codex Cloud or a scheduled automation so they execute on cloud hardware, not the PR runner — then have it open a PR (or comment) with the results table. This keeps multi-minute runs off the critical CI path while still gating merges on the cloud result.
Analyzing Performance Results with AI
Section titled “Analyzing Performance Results with AI”After running load tests, AI tools can help interpret the results.
Database Performance Testing
Section titled “Database Performance Testing”When This Breaks
Section titled “When This Breaks”“Load tests pass locally but the production system is slower.” Your local environment does not match production. Run performance tests against a staging environment that mirrors production infrastructure (same database size, same network latency, same connection limits). Never use local databases for load testing.
“Tests give inconsistent results between runs.” Performance tests are inherently noisy. Run each scenario three times and use the median. Establish acceptable variance bands (plus or minus 15%). Fail only when the median exceeds the threshold, not individual runs.
“We cannot run load tests in CI because they take too long.” Use a tiered approach: smoke tests (30 seconds) on every PR, load tests (10 minutes) nightly, full stress tests weekly. The smoke test catches the obvious regressions; the longer tests catch the subtle ones.
“The AI generated load tests that do not match real traffic patterns.” Give the AI your actual traffic data. Export a sample from your analytics: “Our traffic peaks at 2pm EST, 60% of requests are GET /api/products, and the average user session makes 12 API calls over 8 minutes.”