Skip to content

Debugging Patterns

Your auth endpoint fails roughly 30% of logins in production. The stack trace points at a line that looks fine, the failure vanishes the moment you attach a debugger, and your PM wants an ETA. This is the kind of bug that eats an afternoon. AI does not magically know the answer either — but used systematically, it turns a frustrating hunt into a tight loop of instrument, reproduce, correlate, fix.

  • A reusable prompt for instrumenting intermittent failures with strategic logging
  • A workflow for turning a raw production stack trace into a defensive fix
  • A copy-paste prompt for writing a failing race-condition test before you fix the bug
  • A cross-service tracing recipe that correlates logs by request ID
  • The failure modes of AI-assisted debugging — and how to keep it honest

The highest-leverage pattern: let the AI instrument the suspect path, reproduce under load, then feed the logs back for correlation. The workflow is the same in all three tools — only the surface changes (Cursor edits the file in the editor, Claude Code runs headless over the repo, Codex runs in the TUI or Cloud).

  1. Describe the problem precisely. Vague input produces vague logging. Give it the symptom, the frequency, and what you have ruled out.

  2. Let the tool add the logging. The same prompt drives each tool:

    Open auth.js, select the validateToken function, and run the prompt in Agent mode (Cmd/Ctrl+I). Review the diff inline before accepting. Cursor adds targeted, structured logging:

    async function validateToken(token) {
    console.log('[AUTH] validation started', {
    tokenLength: token?.length,
    at: Date.now(),
    });
    try {
    const decoded = jwt.verify(token, SECRET);
    const msToExpiry = decoded.exp * 1000 - Date.now();
    console.log('[AUTH] decoded', { userId: decoded.userId, msToExpiry });
    if (msToExpiry < 60_000) {
    console.warn('[AUTH] token expiring soon', { msToExpiry });
    }
    return decoded;
    } catch (error) {
    console.error('[AUTH] validation failed', {
    message: error.message,
    iat: jwt.decode(token)?.iat,
    });
    throw error;
    }
    }
  3. Reproduce under load to surface the timing-dependent failure.

    Terminal window
    # Hammer the path so the intermittent failure actually fires
    npm test -- --grep "authentication" --repeat 100 2>&1 | tee debug.log
  4. Feed the logs back and ask for correlation, not a guess.

    A good response narrows to a testable cause — e.g. “failures cluster when validation latency pushes msToExpiry negative; the token expires mid-request. Secondary signal: iat skew of ~4s between two hosts.” Now you fix clock-skew tolerance and pre-emptive refresh against evidence, not a hunch.

Pattern 2: Production Stack-Trace to Defensive Fix

Section titled “Pattern 2: Production Stack-Trace to Defensive Fix”

A raw stack trace from Sentry or your logs is the single richest input you can hand an AI — it pins the file, line, and call chain. The job is to turn it into a fix that handles the edge case without papering over the real cause.

The distinction in that last sentence is what stops the AI from silencing a real bug. A good response separates the two cases:

async processOrder(userId, orderData) {
// "Should never happen" -> fail loudly
if (!userId) throw new ValidationError('User ID required');
const user = await this.getUser(userId);
if (!user) throw new NotFoundError(`User ${userId} not found`);
// "Expected sometimes" -> degrade gracefully
if (!user.stripeCustomer?.id) {
logger.warn('User missing Stripe customer; creating one', { userId });
user.stripeCustomer = await this.createStripeCustomer(user);
}
return this.createOrder(user, orderData);
}

When heap usage climbs without bound, lead with real tooling, then let the AI interpret the evidence. Capture a snapshot with Node’s built-in inspector (node --inspect, then Chrome DevTools Memory tab, or node --heapsnapshot-signal=SIGUSR2), or run clinic.js (clinic heapprofiler) or 0x for a flame graph. Hand the AI the retained-size breakdown, not a vague “it’s leaking”.

The usual culprit the AI will surface is an unbounded collection — listeners, timers, or cache entries added but never removed. The fix is a cleanup path that callers actually invoke:

addListener(event, callback) {
const listener = { event, callback };
this.listeners.push(listener);
// Hand back an unsubscribe so callers can release the reference
return () => {
const i = this.listeners.indexOf(listener);
if (i > -1) this.listeners.splice(i, 1);
};
}

Identify and fix timing-related bugs. The strongest move is to make the AI write a failing test that reproduces the race before it touches the fix — otherwise you cannot tell whether the fix worked.

// AI writes the failing test first
describe('Payment Processing Race Conditions', () => {
it('should handle concurrent submissions', async () => {
const userId = 'test-user';
const paymentData = { amount: 100, currency: 'USD' };
// Simulate rapid clicks
const promises = Array(5).fill(null).map(() =>
processPayment(userId, paymentData)
);
const results = await Promise.allSettled(promises);
// Only one should succeed
const successful = results.filter(r => r.status === 'fulfilled');
expect(successful).toHaveLength(1);
// Others should be rejected with idempotency error
const rejected = results.filter(r => r.status === 'rejected');
expect(rejected).toHaveLength(4);
rejected.forEach(r => {
expect(r.reason.message).toContain('Payment already processing');
});
});
});
// AI suggests idempotency solution
class PaymentService {
constructor() {
this.processingPayments = new Map();
}
async processPayment(userId, paymentData) {
const idempotencyKey = `${userId}-${Date.now()}`;
// Check if already processing
if (this.processingPayments.has(userId)) {
throw new ConflictError('Payment already processing');
}
// Mark as processing
this.processingPayments.set(userId, idempotencyKey);
try {
// Process payment
const result = await this.chargeCard(paymentData);
return result;
} finally {
// Always cleanup
this.processingPayments.delete(userId);
}
}
}

When a failure spans services, the signal lives in the correlation, not in any single log file. The recipe: stamp a request ID at the edge, gather logs from every hop, and let the AI rebuild the timeline.

Terminal window
# Gather logs from every hop (adjust to your platform's log command)
kubectl logs -l app=user-service --since=1h > user-service.log
kubectl logs -l app=payment-service --since=1h > payment-service.log
# Hand them to the AI for timeline reconstruction
claude "Correlate these logs by requestId and trace the failed payment flow for abc-123"

A reconstructed timeline turns “something is slow” into “Payment Service blocked 5s waiting for the DB, then the pool exhausted and the call chain cascaded” — a root cause you can act on. For richer traces, wire up OpenTelemetry spans rather than parsing text logs; the same correlation prompt works on exported span data.

AI-assisted debugging fails in specific, recognizable ways. Knowing them is what separates a real investigation from a confident-sounding dead end.