Przejdź do głównej zawartości

SQL Optimization Patterns

Ta treść nie jest jeszcze dostępna w Twoim języku.

Master SQL query optimization with AI-powered tools and database MCP servers, transforming slow queries into high-performance database operations.

AI-Powered EXPLAIN Analysis

The Pattern: Use AI with database MCP servers to interpret complex execution plans and identify bottlenecks.

-- PRD: Query Performance Analysis
-- Plan: Use database MCP for schema context
-- First, connect to database MCP
"Connect to PostgreSQL MCP and get schema information for the relevant tables"
-- Then analyze the query
"Using the schema context, analyze this query execution plan and identify performance issues:
EXPLAIN (ANALYZE, BUFFERS, VERBOSE)
SELECT c.customer_name, COUNT(o.order_id) as order_count,
SUM(oi.quantity * oi.unit_price) as total_spent
FROM customers c
JOIN orders o ON c.customer_id = o.customer_id
JOIN order_items oi ON o.order_id = oi.order_id
WHERE o.order_date >= '2024-01-01'
GROUP BY c.customer_id, c.customer_name
ORDER BY total_spent DESC
LIMIT 100;"

AI Analysis Provides:

  • Identification of table scans vs index scans
  • Buffer hit ratios and memory usage
  • Join method recommendations
  • Missing index suggestions
  • Query rewrite opportunities
// PRD: Index Optimization Strategy
// Plan: Analyze slow queries and create optimal indexes
// Use database MCP for analysis
"Connect to PostgreSQL MCP and analyze table statistics"
// Use Cursor's Agent mode with database context
"@schema @query_stats Based on the database analysis, recommend indexes:
Todo:
- [ ] Analyze read/write ratio
- [ ] Identify missing indexes
- [ ] Suggest composite indexes
- [ ] Estimate performance impact
- [ ] Provide CREATE INDEX statements
- [ ] Test index effectiveness"

Replace Subqueries with Window Functions

-- Ask AI to optimize this pattern
"Convert this correlated subquery to use window functions:
-- Slow version with subquery
SELECT
customer_id,
order_date,
order_amount,
(SELECT SUM(order_amount)
FROM orders o2
WHERE o2.customer_id = o1.customer_id
AND o2.order_date <= o1.order_date) as running_total
FROM orders o1
-- AI generates optimized version:
SELECT
customer_id,
order_date,
order_amount,
SUM(order_amount) OVER (
PARTITION BY customer_id
ORDER BY order_date
ROWS UNBOUNDED PRECEDING
) as running_total
FROM orders"

Performance Impact: Window functions typically run 5-10x faster than correlated subqueries on large datasets.

Common Table Expressions (CTEs) Optimization

Section titled “Common Table Expressions (CTEs) Optimization”
  1. Identify Repeated Subqueries

    "Find repeated subqueries in this complex query and
    refactor using CTEs for better readability and performance"
  2. Materialize When Beneficial

    -- AI suggests when to use MATERIALIZED hint
    WITH customer_metrics AS MATERIALIZED (
    SELECT customer_id,
    COUNT(*) as order_count,
    SUM(amount) as total_spent
    FROM orders
    GROUP BY customer_id
    )
    -- Rest of query uses the materialized CTE
  3. Recursive CTE Patterns

    "Generate a recursive CTE for hierarchical data:
    - Employee org chart traversal
    - Category tree navigation
    - Bill of materials explosion
    With proper termination conditions"

PostgreSQL-Specific Techniques

-- PRD: PostgreSQL Performance Tuning
-- Connect to PostgreSQL MCP for analysis
"Use PostgreSQL MCP to:
1. Analyze current table statistics
2. Check index bloat
3. Review vacuum settings
4. Identify slow queries"
-- Request PostgreSQL-specific optimizations
"Based on the MCP analysis, optimize for PostgreSQL 15:
- Use partial indexes where applicable
- Consider GIN/GiST for full-text search
- Implement proper vacuuming strategy
- Use BRIN indexes for time-series data"
-- AI generates:
-- Partial index for active customers
CREATE INDEX idx_active_customers
ON customers(customer_id)
WHERE status = 'active';
-- GIN index for JSON search
CREATE INDEX idx_product_attributes
ON products USING gin(attributes);
-- BRIN index for time-series
CREATE INDEX idx_events_timestamp
ON events USING brin(created_at);

MySQL Optimization Techniques

-- PRD: MySQL Performance Enhancement
-- Connect to MySQL MCP
"Connect to MySQL MCP server and analyze:
1. Current index usage
2. Query cache hit rate
3. Buffer pool efficiency
4. Slow query patterns"
-- MySQL-specific optimization request
"Using MySQL MCP insights, optimize for MySQL 8.0:
- Design covering indexes based on actual queries
- Optimize JOIN buffer usage
- Implement partition pruning
- Consider index hints when optimizer stats are stale"
-- AI suggests:
-- Covering index to avoid table access
CREATE INDEX idx_covering
ON orders(customer_id, order_date, status, amount);
-- Force index usage when optimizer is wrong
SELECT /*+ INDEX(orders idx_date_status) */
* FROM orders
WHERE order_date > '2024-01-01'
AND status = 'completed';
-- Ask AI to optimize join order
"Reorder these joins for optimal performance based on:
- Table sizes (customers: 1M, orders: 10M, items: 50M)
- Join selectivity
- Available indexes"
-- AI rewrites from:
SELECT * FROM items i
JOIN orders o ON i.order_id = o.order_id
JOIN customers c ON o.customer_id = c.customer_id
WHERE c.country = 'US'
-- To optimized version:
SELECT * FROM customers c
JOIN orders o ON c.customer_id = o.customer_id
JOIN items i ON o.order_id = i.order_id
WHERE c.country = 'US'

Smart Aggregation Patterns

-- Request aggregation optimization
"Optimize this aggregation query that processes 100M rows:
- Pre-aggregate where possible
- Use approximate functions if acceptable
- Implement incremental aggregation"
-- AI suggests multiple approaches:
-- 1. Pre-aggregation with materialized view
CREATE MATERIALIZED VIEW daily_sales_summary AS
SELECT
DATE(order_date) as sale_date,
product_id,
SUM(quantity) as total_quantity,
SUM(amount) as total_amount,
COUNT(*) as order_count
FROM orders
GROUP BY DATE(order_date), product_id;
-- 2. Approximate count for large datasets
SELECT
product_category,
APPROX_COUNT_DISTINCT(customer_id) as unique_customers
FROM orders
GROUP BY product_category;
-- 3. Incremental aggregation pattern
INSERT INTO hourly_metrics
SELECT
DATE_TRUNC('hour', created_at) as hour,
COUNT(*) as event_count,
AVG(response_time) as avg_response
FROM events
WHERE created_at >= COALESCE(
(SELECT MAX(hour) FROM hourly_metrics),
'2024-01-01'::timestamp
)
GROUP BY DATE_TRUNC('hour', created_at);

Systematic Performance Testing

-- PRD: Query Performance Testing Framework
-- Plan: Build automated benchmarking system
-- Use database MCP for metrics
"Connect to database MCP and create performance baseline"
-- Create benchmarking framework with AI
"Using database MCP, generate a query performance testing framework:
Todo:
- [ ] Capture baseline metrics from MCP
- [ ] Test with different data volumes
- [ ] Compare optimization strategies
- [ ] Generate performance reports
- [ ] Set up continuous monitoring
- [ ] Create alerting thresholds"
-- AI creates comprehensive testing suite:
CREATE TABLE query_benchmarks (
benchmark_id SERIAL PRIMARY KEY,
query_name TEXT,
query_hash TEXT,
execution_time_ms NUMERIC,
rows_returned BIGINT,
buffers_hit BIGINT,
buffers_read BIGINT,
tested_at TIMESTAMP DEFAULT NOW()
);
-- Benchmarking function
CREATE FUNCTION benchmark_query(
p_query_name TEXT,
p_query TEXT
) RETURNS TABLE (
execution_time_ms NUMERIC,
rows_returned BIGINT
) AS $$
DECLARE
v_start_time TIMESTAMP;
v_end_time TIMESTAMP;
v_row_count BIGINT;
BEGIN
v_start_time := clock_timestamp();
EXECUTE p_query;
GET DIAGNOSTICS v_row_count = ROW_COUNT;
v_end_time := clock_timestamp();
INSERT INTO query_benchmarks (
query_name,
query_hash,
execution_time_ms,
rows_returned
) VALUES (
p_query_name,
MD5(p_query),
EXTRACT(MILLISECONDS FROM v_end_time - v_start_time),
v_row_count
);
RETURN QUERY
SELECT
EXTRACT(MILLISECONDS FROM v_end_time - v_start_time),
v_row_count;
END;
$$ LANGUAGE plpgsql;

Strategic Denormalization

-- Ask AI for denormalization recommendations
"Analyze this normalized schema and suggest denormalization for:
- Read-heavy workloads (95% reads)
- Common join patterns
- Acceptable data redundancy
- Update frequency considerations"
-- AI recommends:
-- 1. Denormalized summary table
CREATE TABLE customer_order_summary AS
SELECT
c.customer_id,
c.customer_name,
c.email,
COUNT(o.order_id) as lifetime_orders,
SUM(o.total_amount) as lifetime_value,
MAX(o.order_date) as last_order_date,
AVG(o.total_amount) as avg_order_value
FROM customers c
LEFT JOIN orders o ON c.customer_id = o.customer_id
GROUP BY c.customer_id, c.customer_name, c.email;
-- 2. Triggers to maintain consistency
CREATE TRIGGER update_customer_summary
AFTER INSERT OR UPDATE OR DELETE ON orders
FOR EACH ROW
EXECUTE FUNCTION update_customer_summary_func();
  1. Identify Partitioning Candidates

    "Analyze our tables and recommend partitioning strategies:
    - Tables over 100GB
    - Clear access patterns
    - Time-based or range-based queries"
  2. Implement Partitioning

    -- AI generates partitioning scheme
    CREATE TABLE orders_partitioned (
    LIKE orders INCLUDING ALL
    ) PARTITION BY RANGE (order_date);
    -- Create monthly partitions
    CREATE TABLE orders_2024_01
    PARTITION OF orders_partitioned
    FOR VALUES FROM ('2024-01-01') TO ('2024-02-01');
  3. Partition Maintenance

    "Create automated partition management:
    - Auto-create future partitions
    - Archive old partitions
    - Update partition statistics"

Intelligent Caching Strategies

-- Request caching recommendations
"Design a caching strategy for our application:
- Identify cacheable queries
- Set appropriate TTLs
- Implement cache invalidation
- Monitor cache hit rates"
-- AI provides comprehensive solution:
-- 1. Redis caching layer
-- 2. Query result caching
-- 3. Prepared statement optimization
-- 4. Connection pooling configuration

Slow Query Detection

# Use database MCP for monitoring
"Connect to PostgreSQL MCP for query monitoring"
"Using MCP query stats, create automated slow query detection:
- [ ] Log queries over 1 second
- [ ] Identify query patterns
- [ ] Alert on degradation
- [ ] Suggest optimizations based on EXPLAIN"

Index Usage Analysis

"Analyze index usage:
- Find unused indexes
- Identify missing indexes
- Calculate index bloat
- Recommend maintenance"

Statistics Updates

"Automate statistics maintenance:
- Update table statistics
- Analyze query patterns
- Vacuum scheduling
- Monitor table growth"

Performance Trending

"Track performance over time:
- Query execution trends
- Resource utilization
- Growth projections
- Capacity planning"

Cloud Database Cost Optimization

-- PRD: Database Cost Optimization
-- Plan: Reduce cloud database costs by 40%
-- Use cloud and database MCPs
"Connect to AWS MCP and PostgreSQL MCP to analyze:
1. Current RDS instance usage
2. Storage growth patterns
3. Query resource consumption"
-- Optimize for cloud database costs
"Based on MCP data, optimize for cost:
Todo:
- [ ] Identify expensive queries consuming IOPS
- [ ] Recommend reserved instances based on usage
- [ ] Suggest storage tier optimization
- [ ] Implement data archival policies
- [ ] Set up cost alerts"
-- AI provides cost-saving strategies:
-- 1. Archive old data to cheaper storage
-- 2. Use read replicas for analytics
-- 3. Implement connection pooling
-- 4. Schedule heavy queries during off-peak

Advanced SQL Optimization

Ready to push further? Explore:

  • Parallel Query Execution: Leverage multi-core processing
  • Columnar Storage: For analytical workloads
  • In-Memory Optimization: For ultra-fast queries
  • AI-Driven Auto-Tuning: Self-optimizing databases
  • Distributed SQL: Scaling across multiple nodes

Use AI assistants to guide implementation of these advanced techniques while maintaining query correctness and data integrity.