Przejdź do głównej zawartości

Database Design - Cursor IDE

Ta treść nie jest jeszcze dostępna w Twoim języku.

Your startup is building a multi-tenant SaaS platform for project management. You need to design a database that handles: user organizations, projects, tasks, time tracking, file attachments, real-time comments, and activity feeds. The system must scale to thousands of organizations with millions of tasks while maintaining sub-100ms query performance.

By completing this lesson, you’ll master:

  • AI-assisted database schema design
  • Choosing between SQL and NoSQL strategically
  • Implementing multi-tenancy patterns
  • Optimizing for read/write patterns
  • Creating efficient indexes and constraints
  • Migration strategies and versioning
  • Basic understanding of databases (SQL/NoSQL)
  • Completed previous lessons
  • PostgreSQL or MySQL installed
  • Optional: Redis for caching examples
  • Optional: Database MCP servers (PostgreSQL, SQLite, Redis)

Design a database that:

  • Supports multi-tenant isolation
  • Handles complex relationships efficiently
  • Scales to millions of records
  • Maintains ACID compliance where needed
  • Provides real-time query performance
  • Includes audit trails and soft deletes
  1. Define Data Requirements

Start with Ask mode to clarify requirements:

"Help me design a database for a project management SaaS:
- Multi-tenant (B2B)
- Organizations have multiple users
- Projects contain tasks with subtasks
- Time tracking per task
- File attachments (S3 URLs)
- Real-time comments
- Activity feeds
- Full audit trail
What questions should I answer first?"
  1. Analyze Access Patterns
"Based on these features, analyze the access patterns:
- User dashboard (projects, recent tasks, notifications)
- Project view (tasks, milestones, team members)
- Task details (comments, attachments, time logs)
- Reports (time by project, user productivity)
- Search (across all entities)
Identify read/write ratios and hot paths"
  1. Choose Database Strategy

Switch to Agent mode:

"Compare database options for this use case:
- PostgreSQL with JSONB
- MySQL with proper indexing
- MongoDB for flexibility
- Hybrid approach (PostgreSQL + Redis)
Consider: ACID needs, query complexity, scaling strategy"

Phase 1.5: Database MCP Servers for Rapid Development

Section titled “Phase 1.5: Database MCP Servers for Rapid Development”
  1. Available Database MCP Servers

Configure database connections via MCP:

~/.cursor/mcp.json
{
"mcpServers": {
"postgres-dev": {
"command": "npx",
"args": [
"-y",
"@modelcontextprotocol/server-postgres",
"postgresql://user:pass@localhost:5432/projectdb"
]
},
"sqlite-local": {
"command": "npx",
"args": [
"-y",
"@modelcontextprotocol/server-sqlite",
"./data/local.db"
]
},
"redis-cache": {
"command": "npx",
"args": ["-y", "@redis/mcp-redis"],
"env": {
"REDIS_URL": "redis://localhost:6379"
}
}
}
}
  1. Schema Design with MCP
// With PostgreSQL MCP active:
"Using PostgreSQL MCP, analyze the current schema and:
- Show all tables and relationships
- Identify missing indexes
- Suggest normalization improvements
- Check for anti-patterns"
// Direct schema creation:
"Create the organizations and users tables with:
- UUID primary keys
- Proper constraints
- Audit columns (created_at, updated_at)
- Soft delete support"
  1. Advanced Database MCP Usage
"Using PostgreSQL MCP:
- List all foreign key relationships
- Find tables without primary keys
- Identify unused indexes
- Show table sizes and row counts"
  1. Redis MCP for Caching
"Using Redis MCP, implement caching for:
- User session data (SET/GET with TTL)
- Dashboard query results (with invalidation)
- Real-time presence tracking (using Redis Sets)
- Rate limiting counters (using INCR with EXPIRE)"
// Example Redis operations via MCP:
"Set up a rate limiter using Redis:
- Use INCR for request counting
- Set 60-second expiration
- Return current count
- Handle race conditions"
  1. Vector Database Integration

For AI-powered features:

{
"mcpServers": {
"chroma": {
"command": "npx",
"args": ["-y", "chroma-mcp"],
"env": {
"CHROMA_URL": "http://localhost:8000"
}
}
}
}
"Using Chroma MCP, implement:
- Semantic search for tasks
- Similar project recommendations
- Content-based filtering
- Embedding storage for ML features"

Security Considerations:

  • Use read-only credentials for analysis
  • Never expose production credentials
  • Rotate MCP database passwords regularly
  • Use connection string encryption

Performance Tips:

  • Limit result sets with MCP queries
  • Use connection pooling settings
  • Monitor MCP query execution time
  • Cache frequent MCP operations
  1. Design Core Schema
"Design the core database schema with:
- Proper normalization (3NF where appropriate)
- Multi-tenancy strategy (shared schema vs separate)
- Soft delete implementation
- Audit fields (created_by, updated_at, etc.)
- UUID vs serial IDs
Create SQL DDL statements"

Example schema:

-- Organizations (tenants)
CREATE TABLE organizations (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
name VARCHAR(255) NOT NULL,
slug VARCHAR(100) UNIQUE NOT NULL,
plan VARCHAR(50) DEFAULT 'free',
settings JSONB DEFAULT '{}',
created_at TIMESTAMPTZ DEFAULT NOW(),
updated_at TIMESTAMPTZ DEFAULT NOW(),
deleted_at TIMESTAMPTZ
);
-- Users
CREATE TABLE users (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
email VARCHAR(255) UNIQUE NOT NULL,
name VARCHAR(255) NOT NULL,
password_hash VARCHAR(255),
settings JSONB DEFAULT '{}',
created_at TIMESTAMPTZ DEFAULT NOW(),
updated_at TIMESTAMPTZ DEFAULT NOW(),
deleted_at TIMESTAMPTZ
);
-- Organization membership
CREATE TABLE organization_members (
organization_id UUID REFERENCES organizations(id),
user_id UUID REFERENCES users(id),
role VARCHAR(50) NOT NULL DEFAULT 'member',
joined_at TIMESTAMPTZ DEFAULT NOW(),
PRIMARY KEY (organization_id, user_id)
);
-- Projects
CREATE TABLE projects (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
organization_id UUID REFERENCES organizations(id) NOT NULL,
name VARCHAR(255) NOT NULL,
description TEXT,
status VARCHAR(50) DEFAULT 'active',
settings JSONB DEFAULT '{}',
created_by UUID REFERENCES users(id),
created_at TIMESTAMPTZ DEFAULT NOW(),
updated_at TIMESTAMPTZ DEFAULT NOW(),
deleted_at TIMESTAMPTZ,
-- Composite index for tenant isolation
INDEX idx_org_projects (organization_id, status, created_at DESC)
);
-- Tasks with hierarchical structure
CREATE TABLE tasks (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
organization_id UUID REFERENCES organizations(id) NOT NULL,
project_id UUID REFERENCES projects(id) NOT NULL,
parent_task_id UUID REFERENCES tasks(id),
title VARCHAR(500) NOT NULL,
description TEXT,
status VARCHAR(50) DEFAULT 'todo',
priority INTEGER DEFAULT 0,
due_date DATE,
assignee_id UUID REFERENCES users(id),
estimated_hours DECIMAL(5,2),
metadata JSONB DEFAULT '{}',
created_by UUID REFERENCES users(id),
created_at TIMESTAMPTZ DEFAULT NOW(),
updated_at TIMESTAMPTZ DEFAULT NOW(),
deleted_at TIMESTAMPTZ,
-- Indexes for common queries
INDEX idx_org_tasks (organization_id, project_id, status),
INDEX idx_assignee_tasks (assignee_id, status, due_date),
INDEX idx_parent_tasks (parent_task_id)
);
  1. Design Supporting Tables
@schema.sql
"Add supporting tables for:
- Time tracking entries
- Comments (polymorphic)
- Attachments (polymorphic)
- Activity feed
- Notifications
Include proper indexes and constraints"
  1. Implement Advanced Features
"Implement advanced database features:
- Row Level Security for multi-tenancy
- Materialized views for dashboards
- Full-text search configuration
- Trigger-based audit logging
- Partitioning strategy for large tables"
  1. Create Optimal Indexes
@schema.sql
"Analyze query patterns and create indexes:
- Covering indexes for hot queries
- Partial indexes for filtered queries
- GIN indexes for JSONB search
- Composite indexes for sorts
- Consider index maintenance cost"

Example indexing strategy:

-- Dashboard queries (most frequent)
CREATE INDEX idx_user_recent_tasks
ON tasks (assignee_id, updated_at DESC)
WHERE deleted_at IS NULL AND status != 'done';
-- Project task list with filters
CREATE INDEX idx_project_tasks_filtered
ON tasks (project_id, status, priority DESC, created_at DESC)
WHERE deleted_at IS NULL;
-- Full-text search on tasks
CREATE INDEX idx_tasks_search
ON tasks USING gin(
to_tsvector('english', title || ' ' || COALESCE(description, ''))
);
-- JSONB metadata search
CREATE INDEX idx_tasks_metadata
ON tasks USING gin(metadata);
-- Time tracking aggregations
CREATE INDEX idx_time_entries_reporting
ON time_entries (organization_id, user_id, date)
INCLUDE (hours);
  1. Implement Caching Strategy
"Design caching strategy with Redis:
- Cache warming for dashboards
- Invalidation patterns
- Query result caching
- Session storage
- Real-time presence data
Create implementation plan"
  1. Query Optimization
// Method 1: Using Database MCP
"Using PostgreSQL MCP, optimize this dashboard query:
SELECT * FROM tasks WHERE assignee_id = $1 AND status = 'active'
- Run EXPLAIN ANALYZE
- Suggest better indexes
- Show query cost reduction"
// Method 2: Manual optimization
@queries/dashboard.sql
"Optimize these critical queries:
- User dashboard (projects + recent tasks)
- Project overview (stats + team + progress)
- Task search with filters
- Time report aggregations
Use EXPLAIN ANALYZE and suggest improvements"
// MCP can directly test optimizations:
"Using PostgreSQL MCP:
1. Create the suggested index
2. Re-run EXPLAIN ANALYZE
3. Compare before/after performance
4. Rollback if not improved"
  1. Implement Row Level Security
"Implement PostgreSQL RLS for multi-tenancy:
- Policies for each table
- Performance implications
- Testing strategies
- Bypass for admin operations
Ensure complete tenant isolation"

Example RLS implementation:

-- Enable RLS
ALTER TABLE projects ENABLE ROW LEVEL SECURITY;
ALTER TABLE tasks ENABLE ROW LEVEL SECURITY;
-- Create policies
CREATE POLICY tenant_isolation_projects ON projects
FOR ALL
TO application_user
USING (organization_id = current_setting('app.current_org_id')::UUID);
CREATE POLICY tenant_isolation_tasks ON tasks
FOR ALL
TO application_user
USING (organization_id = current_setting('app.current_org_id')::UUID);
-- Function to set current tenant
CREATE OR REPLACE FUNCTION set_current_tenant(org_id UUID)
RETURNS void AS $$
BEGIN
PERFORM set_config('app.current_org_id', org_id::text, true);
END;
$$ LANGUAGE plpgsql;
  1. Design Tenant Isolation
"Create complete tenant isolation strategy:
- Connection pooling per tenant
- Query monitoring and limits
- Storage quotas
- Backup strategies
- Data export capabilities"
  1. Performance Monitoring
"Set up performance monitoring:
- Slow query logging
- Index usage statistics
- Table bloat monitoring
- Connection pool metrics
- Tenant resource usage"
  1. Create Migration System
"Set up database migration system:
- Version control for schema
- Rollback capabilities
- Zero-downtime migrations
- Data migration strategies
- Testing migrations"

Example migration:

migrations/001_add_task_dependencies.ts
export const up = async (db: Database) => {
await db.schema.createTable('task_dependencies', (table) => {
table.uuid('id').primary().defaultTo(db.raw('gen_random_uuid()'));
table.uuid('task_id').references('id').inTable('tasks').notNullable();
table.uuid('depends_on_task_id').references('id').inTable('tasks').notNullable();
table.enum('dependency_type', ['blocks', 'relates_to', 'duplicates']);
table.timestamps(true, true);
table.unique(['task_id', 'depends_on_task_id']);
table.index(['depends_on_task_id']);
});
// Add check constraint to prevent self-dependencies
await db.raw(`
ALTER TABLE task_dependencies
ADD CONSTRAINT no_self_dependency
CHECK (task_id != depends_on_task_id)
`);
};
export const down = async (db: Database) => {
await db.schema.dropTable('task_dependencies');
};
  1. Plan Schema Evolution
"Plan for future schema changes:
- Adding custom fields per tenant
- Archival strategies
- Sharding preparation
- Read replica setup
- Schema versioning"
  1. Create Test Data
"Generate realistic test data:
- Multiple organizations
- Varying data volumes
- Edge cases (deep nesting, etc)
- Performance test scenarios
- Data integrity verification"

Technique 1: MCP-Driven Database Development

Section titled “Technique 1: MCP-Driven Database Development”

Leverage MCP for rapid iteration:

// Real-time schema exploration
"Using PostgreSQL MCP:
- Show me the current schema
- Find all tables with > 1M rows
- List indexes not used in 30 days"
// Direct query execution
"Using PostgreSQL MCP, run this query:
SELECT table_name, pg_size_pretty(pg_total_relation_size(table_name::regclass))
FROM information_schema.tables
WHERE table_schema = 'public'
ORDER BY pg_total_relation_size(table_name::regclass) DESC"
// Performance analysis
"Analyze this slow query with PostgreSQL MCP:
- Show execution plan
- Identify bottlenecks
- Suggest optimizations"

Use proven patterns:

-- Polymorphic associations for flexibility
CREATE TABLE comments (
id UUID PRIMARY KEY,
commentable_type VARCHAR(50), -- 'task', 'project'
commentable_id UUID,
content TEXT,
-- Partial indexes for each type
INDEX idx_task_comments (commentable_id)
WHERE commentable_type = 'task'
);
-- Hierarchical data with ltree
CREATE EXTENSION ltree;
CREATE TABLE categories (
id SERIAL PRIMARY KEY,
path ltree NOT NULL,
name VARCHAR(255)
);
CREATE INDEX idx_category_path ON categories USING gist(path);

Optimize complex queries:

-- Use CTEs for readability and performance
WITH user_projects AS (
SELECT p.* FROM projects p
JOIN organization_members om ON p.organization_id = om.organization_id
WHERE om.user_id = $1 AND p.deleted_at IS NULL
),
recent_tasks AS (
SELECT t.* FROM tasks t
WHERE t.project_id IN (SELECT id FROM user_projects)
AND t.updated_at > NOW() - INTERVAL '7 days'
ORDER BY t.updated_at DESC
LIMIT 10
)
SELECT * FROM recent_tasks;

Leverage JSONB wisely:

-- Settings with defaults and validation
CREATE TABLE tenant_settings (
organization_id UUID PRIMARY KEY,
settings JSONB NOT NULL DEFAULT '{
"features": {
"timeTracking": true,
"customFields": false
},
"limits": {
"maxProjects": 10,
"maxUsersPerProject": 50
}
}',
CONSTRAINT valid_settings CHECK (
settings ? 'features' AND
settings ? 'limits'
)
);
-- Query JSONB efficiently
CREATE INDEX idx_premium_tenants ON tenant_settings
((settings->'features'->>'customFields'))
WHERE (settings->'features'->>'customFields')::boolean = true;

Problem: Loading related data in loops

Solution:

// Bad: N+1 queries
const projects = await db.query('SELECT * FROM projects');
for (const project of projects) {
project.tasks = await db.query(
'SELECT * FROM tasks WHERE project_id = $1',
[project.id]
);
}
// Good: Single query with join
const projectsWithTasks = await db.query(`
SELECT p.*,
COALESCE(
json_agg(t.*) FILTER (WHERE t.id IS NOT NULL),
'[]'
) as tasks
FROM projects p
LEFT JOIN tasks t ON t.project_id = p.id
GROUP BY p.id
`);

Take your database design further:

  1. Advanced Features

    • Implement event sourcing for audit trail
    • Add real-time subscriptions with LISTEN/NOTIFY
    • Create materialized views for analytics
    • Implement database-level search with pg_trgm
  2. Scaling Strategies

    • Design sharding strategy by organization
    • Implement read replicas with lag monitoring
    • Add connection pooling with PgBouncer
    • Create archival process for old data
  3. Performance Tuning

    • Implement query result caching
    • Add database-level rate limiting
    • Create custom aggregation functions
    • Optimize for specific workloads

Your database design is successful when:

  • ✅ All queries execute in under 100ms at scale
  • ✅ Zero data inconsistencies
  • ✅ Complete tenant isolation verified
  • ✅ Migrations run without downtime
  • ✅ 99.9% uptime maintained
  • ✅ Backup/restore tested and under 5min
  • ✅ Monitoring catches issues proactively
  • ✅ Schema supports future features

Teams using these patterns achieve:

  • 10x improvement in query performance
  • 90% reduction in database-related bugs
  • Zero tenant data leaks
  • 5-minute recovery time objective

Before going to production:

  • Schema peer reviewed
  • Indexes analyzed with production data
  • Multi-tenancy isolation tested
  • Backup strategy implemented
  • Monitoring and alerts configured
  • Migration rollback tested
  • Performance benchmarked
  • Security audit completed
  1. Design for Queries: Know your access patterns before designing
  2. Plan for Scale: Design assuming 100x growth
  3. Index Strategically: Too few or too many both hurt
  4. Test with Real Data: Development data hides issues
  5. Monitor Everything: You can’t optimize what you don’t measure
  • Requirements analysis: 2 hours
  • Initial schema design: 3 hours
  • Optimization and indexes: 2 hours
  • Multi-tenancy setup: 2 hours
  • Testing and refinement: 3 hours
  • Total: ~12 hours (vs 40+ traditional)

You’ve mastered database design. Ready for more?

Graph Databases

Explore Neo4j for complex relationships

Time Series

Design for IoT and metrics data

Event Sourcing

Implement event-driven architecture

Continue to Frontend UI from Design →