Logging and Monitoring Setup in Cursor

It is 2 AM and your pager goes off. The alert says “high error rate on orders API.” You open your monitoring dashboard and see a spike in 500 errors, but the logs just say “Internal Server Error” with no stack trace, no request ID, no context about which endpoint or which user was affected. You SSH into the production server, tail the logs, and find thousands of lines of unstructured text mixed with debug output that someone forgot to remove. Thirty minutes in, you still do not know what is broken.

This is the cost of skipping observability. Structured logging, application metrics, and distributed tracing are the difference between a 5-minute diagnosis and a 2-hour scramble. Cursor Agent can generate your entire observability stack because the patterns are well-defined: structured loggers, metric collectors, trace propagation, and alert rules all follow standard schemas that the AI generates reliably.

What You’ll Walk Away With

A structured logging setup with correlation IDs and request context
Application metrics collection with Prometheus-compatible instrumentation
Distributed tracing configuration for multi-service architectures
Alert rules that trigger on meaningful conditions, not noise
Copy-paste prompts for generating each observability layer

Structured Logging

The foundation of observability is structured logging. Every log line should be machine-parseable JSON with consistent fields, so you can search, filter, and aggregate across your entire system.

Copy-paste prompt for setting up structured logging:

@src/lib @package.json

Create a structured logging module at src/lib/logger.ts that:

1. Uses pino for Node.js structured JSON logging
2. Includes these fields on every log line:
   - timestamp (ISO 8601)
   - level (trace, debug, info, warn, error, fatal)
   - service (from SERVICE_NAME env var)
   - environment (from NODE_ENV)
   - correlation_id (from async local storage, if present)
   - request_id (from async local storage, if present)
3. Creates child loggers with additional context (e.g., user_id, order_id)
4. Redacts sensitive fields: password, token, authorization, cookie, ssn
5. In development: pretty-print with colors
6. In production: single-line JSON for log aggregation
7. Supports log level configuration via LOG_LEVEL env var

Also create Express middleware at src/middleware/request-logger.ts that:
- Generates a unique request_id for each request
- Stores correlation_id (from x-correlation-id header) in async local storage
- Logs request start (method, path, query params)
- Logs request end (status code, duration in ms, response size)
- Logs errors with full stack trace and request context
- Does NOT log health check requests (/health, /ready)

After generation, verify that the logger produces the right output:

Test the logger by pasting this into Agent mode:

"Write a quick test that imports our logger, creates a child logger
with user_id context, logs an info message and an error with a stack trace.
Show me what the JSON output looks like for both development and production modes."

Adding Context to Every Log

The most valuable logging improvement is adding business context. When you can search for all logs related to order ord_abc123, debugging becomes dramatically faster.

Copy-paste prompt for contextual order logging:

@src/lib/logger.ts @src/services/orders.ts

Add contextual logging to the orders service:

1. When processing an order, create a child logger with:
   - order_id
   - customer_id
   - total_amount
   - payment_method
2. Log at each stage of order processing:
   - Order received (info)
   - Inventory check started/completed (debug)
   - Payment initiated/completed/failed (info/error)
   - Order confirmed/cancelled (info)
3. On error, include the full error object with stack trace
4. Include timing for each stage (started_at, duration_ms)

Every log line from order processing should be filterable by order_id.

Application Metrics

Metrics tell you what is happening across your system in aggregate. While logs show individual events, metrics show trends: request rates, error rates, latency distributions, and resource utilization.

Copy-paste prompt for Prometheus-compatible metrics:

@src/lib/logger.ts @src/middleware

Create a metrics module at src/lib/metrics.ts using prom-client that exposes:

1. HTTP metrics (auto-instrumented):
   - http_requests_total (counter, labels: method, path, status_code)
   - http_request_duration_seconds (histogram, labels: method, path)
   - http_request_size_bytes (histogram)
   - http_response_size_bytes (histogram)

2. Business metrics:
   - orders_created_total (counter, labels: payment_method, status)
   - order_processing_duration_seconds (histogram)
   - payment_attempts_total (counter, labels: provider, result)
   - active_users_gauge (gauge)

3. System metrics:
   - nodejs_event_loop_lag_seconds (histogram)
   - nodejs_active_handles_total (gauge)
   - Default Node.js metrics (memory, CPU, GC)

Create Express middleware that:
- Records HTTP metrics for every request automatically
- Exposes GET /metrics endpoint in Prometheus text format
- Does NOT record metrics for /metrics and /health endpoints

Create helper functions for recording business metrics:
- recordOrderCreated(paymentMethod, status)
- recordPaymentAttempt(provider, result)
- observeOrderDuration(durationMs)

Distributed Tracing

For microservices architectures, distributed tracing connects a single user request across multiple services. A trace shows the complete journey: API gateway to auth service to orders service to payment service and back.

Copy-paste prompt for OpenTelemetry tracing:

@src/lib/logger.ts @src/middleware

Set up OpenTelemetry distributed tracing at src/lib/tracing.ts:

1. Configure the OpenTelemetry SDK with:
   - OTLP exporter (configurable endpoint via OTEL_EXPORTER_OTLP_ENDPOINT)
   - Service name from SERVICE_NAME env var
   - Auto-instrumentation for: HTTP, Express, PostgreSQL, Redis
   - Batch span processor with 5-second flush interval

2. Create trace context propagation middleware:
   - Extract trace context from incoming W3C traceparent header
   - Create a new span for each incoming request
   - Add span attributes: http.method, http.url, http.status_code
   - Propagate trace context to outgoing HTTP requests

3. Create helper functions for custom spans:
   - startSpan(name, attributes) -> span
   - withSpan(name, fn) -> wraps a function in a span
   - addSpanEvent(name, attributes) -> adds event to current span

4. Connect tracing with our logger:
   - Include trace_id and span_id in every log line
   - This lets us correlate logs with traces in our observability platform

In development, export traces to console.
In production, export to an OTLP collector.

Alerting Rules

Metrics and traces are useless if nobody looks at them. Alerts bridge the gap between data collection and incident response. The key is alerting on symptoms (user-facing impact), not causes (high CPU).

Copy-paste prompt for Prometheus alert rules:

Create alerting rule configurations at monitoring/alerts/:

1. monitoring/alerts/api.yml - API health alerts:
   - Error rate > 1% for 5 minutes (warning)
   - Error rate > 5% for 2 minutes (critical)
   - p99 latency > 2 seconds for 5 minutes (warning)
   - p99 latency > 5 seconds for 2 minutes (critical)
   - Zero requests for 1 minute (critical - service probably down)

2. monitoring/alerts/business.yml - Business metric alerts:
   - Order creation rate drops > 50% compared to same time yesterday (warning)
   - Payment failure rate > 10% for 10 minutes (critical)
   - Zero orders for 15 minutes during business hours (critical)

3. monitoring/alerts/infrastructure.yml - Resource alerts:
   - Memory usage > 85% for 10 minutes (warning)
   - Disk usage > 90% (critical)
   - Pod restart count > 3 in 10 minutes (warning)

Use Prometheus alerting rule format.
Include runbook URLs in annotations for each alert.
Each alert must have: summary, description, severity, and runbook_url.

Dashboard Generation

Once you have metrics and traces, you need dashboards to visualize them. Cursor can generate dashboard configurations:

Copy-paste prompt for generating a Grafana dashboard:

Create a Grafana dashboard JSON at monitoring/dashboards/api-overview.json with:

Row 1 - Traffic:
- Request rate (requests/second, by endpoint)
- Error rate (percentage of 5xx responses)
- Active requests (gauge)

Row 2 - Latency:
- p50, p95, p99 latency over time (line chart)
- Latency distribution (heatmap)
- Slowest endpoints (table, last 5 minutes)

Row 3 - Business:
- Orders created per minute
- Payment success/failure rate
- Revenue per hour (if available)

Row 4 - Infrastructure:
- CPU and memory usage per pod
- Event loop lag
- Database connection pool utilization

Use Prometheus as the data source.
Include template variables for: environment, service, time range.
Set auto-refresh to 30 seconds.

When This Breaks

Logs are too verbose in production. If Agent generates debug-level logging everywhere, your log storage costs will spike. Set the default log level to info in production and debug only in development. Use the LOG_LEVEL environment variable to change it without redeploying.

Metrics have too many label combinations (cardinality explosion). If Agent uses user_id or request_id as a metric label, you will create millions of time series and your metrics storage will crash. Metric labels should be low-cardinality: HTTP method, status code, endpoint pattern (not the full URL with path parameters). Review every metric label Agent generates.

Traces are incomplete. If only some services have tracing configured, traces will have gaps. The trace context (traceparent header) must be propagated through every service in the request path. Ask Agent: “Verify that every outgoing HTTP call in our service includes the W3C traceparent header from the current trace context.”

Alerts fire too often (alert fatigue). Start with higher thresholds and only tighten them after you have baseline data. For new deployments, use “recording rules” to compute baselines for a week before enabling alerts.

Correlation IDs are missing. If your logs do not have correlation IDs, you cannot trace a request across services. The middleware from the first prompt generates request IDs, but you also need to propagate them in outgoing requests. Ask Agent: “Update our HTTP client to include the x-correlation-id header from async local storage in every outgoing request.”

What’s Next

DevOps Deploy your monitoring stack with infrastructure as code.

Microservices Apply distributed tracing across your microservices.

CI/CD Add monitoring checks to your deployment pipeline.