6 min read
Databases & Caching
Relational Databases (SQL)
Core Concepts
sql-- DDL: Define schema
CREATE TABLE users (
id SERIAL PRIMARY KEY,
email VARCHAR(255) UNIQUE NOT NULL,
name VARCHAR(100) NOT NULL,
role VARCHAR(20) NOT NULL DEFAULT 'user',
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
CREATE TABLE posts (
id SERIAL PRIMARY KEY,
user_id INTEGER NOT NULL REFERENCES users(id) ON DELETE CASCADE,
title TEXT NOT NULL,
body TEXT,
published BOOLEAN NOT NULL DEFAULT FALSE,
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
CREATE INDEX idx_posts_user_id ON posts(user_id);
CREATE INDEX idx_posts_created_at ON posts(created_at DESC);Querying
sql-- SELECT with filtering
SELECT id, name, email FROM users
WHERE role = 'admin' AND created_at > '2024-01-01'
ORDER BY created_at DESC
LIMIT 20 OFFSET 40;
-- JOINs
SELECT u.name, COUNT(p.id) AS post_count
FROM users u
LEFT JOIN posts p ON p.user_id = u.id
WHERE p.published = TRUE
GROUP BY u.id, u.name
HAVING COUNT(p.id) > 5
ORDER BY post_count DESC;
-- INNER JOIN — only rows that match both sides
-- LEFT JOIN — all left rows, NULLs for unmatched right
-- RIGHT JOIN — all right rows, NULLs for unmatched left
-- FULL OUTER JOIN — all rows from both, NULLs where no match
-- Subqueries
SELECT * FROM users
WHERE id IN (SELECT user_id FROM posts WHERE published = TRUE);
-- CTEs (Common Table Expressions)
WITH active_users AS (
SELECT user_id FROM posts
WHERE created_at > NOW() - INTERVAL '30 days'
GROUP BY user_id
)
SELECT u.name FROM users u
JOIN active_users au ON au.user_id = u.id;
-- Upsert
INSERT INTO settings (user_id, theme)
VALUES (1, 'dark')
ON CONFLICT (user_id) DO UPDATE SET theme = EXCLUDED.theme;Indexing
Indexes speed up reads but slow down writes (index must be updated).
sql-- B-tree (default) — range queries, equality, ORDER BY
CREATE INDEX idx_email ON users(email);
CREATE INDEX idx_created ON posts(created_at DESC);
-- Composite index — order matters (leftmost prefix rule)
CREATE INDEX idx_status_date ON posts(published, created_at DESC);
-- Useful for: WHERE published = TRUE ORDER BY created_at
-- Not useful for: WHERE created_at > ... (published not in WHERE)
-- Partial index — smaller, faster for filtered queries
CREATE INDEX idx_active_users ON users(email) WHERE active = TRUE;
-- Covering index — query satisfied entirely from index (no heap fetch)
CREATE INDEX idx_cover ON posts(user_id) INCLUDE (title, created_at);When indexes help / hurt
| Scenario | Use index? |
|---|---|
WHERE email = ? (high cardinality) |
✅ |
WHERE status IN ('A', 'B') (low cardinality) |
❌ — full scan often faster |
ORDER BY created_at DESC LIMIT 10 |
✅ |
WHERE LOWER(email) = ? |
Only with functional index |
| Small table (< ~1000 rows) | ❌ — sequential scan faster |
EXPLAIN ANALYZE — always check the query plan:
sqlEXPLAIN ANALYZE SELECT * FROM posts WHERE user_id = 1;
-- Look for: Seq Scan (bad on large tables), Index Scan (good), Nested LoopTransactions & ACID
sqlBEGIN;
UPDATE accounts SET balance = balance - 100 WHERE id = 1;
UPDATE accounts SET balance = balance + 100 WHERE id = 2;
COMMIT;
-- On error
ROLLBACK;
-- Savepoints
BEGIN;
SAVEPOINT sp1;
UPDATE ...;
ROLLBACK TO SAVEPOINT sp1; -- undo just this part
COMMIT;ACID properties:
- Atomicity — all or nothing; partial changes are rolled back
- Consistency — constraints (FK, unique, check) enforced
- Isolation — concurrent transactions don't interfere
- Durability — committed data survives crashes (WAL)
Isolation Levels
READ UNCOMMITTED — can read uncommitted data (dirty read) — rarely used
READ COMMITTED — default in Postgres — only see committed data
REPEATABLE READ — same row returns same value within transaction
SERIALIZABLE — full isolation, as if transactions run sequentially| Problem | READ COMMITTED | REPEATABLE READ | SERIALIZABLE |
|---|---|---|---|
| Dirty read | ✅ prevented | ✅ | ✅ |
| Non-repeatable read | ❌ possible | ✅ prevented | ✅ |
| Phantom read | ❌ possible | ❌ | ✅ prevented |
NoSQL — Document (MongoDB)
js// Collections are like tables, documents are like rows
// Schema-free — documents in same collection can differ
// Insert
await db.collection('users').insertOne({ name: 'Alice', role: 'admin' })
await db.collection('users').insertMany([...])
// Find
await db.collection('users').findOne({ email: 'alice@example.com' })
await db.collection('users').find({ role: 'admin' }).sort({ name: 1 }).limit(10).toArray()
// Update
await db.collection('users').updateOne(
{ _id: id },
{ $set: { role: 'moderator' }, $push: { tags: 'verified' } }
)
// Delete
await db.collection('users').deleteOne({ _id: id })
// Aggregation pipeline
await db.collection('orders').aggregate([
{ $match: { status: 'completed' } },
{ $group: { _id: '$userId', total: { $sum: '$amount' } } },
{ $sort: { total: -1 } },
{ $limit: 10 },
]).toArray()SQL vs NoSQL Decision
| Criteria | SQL | NoSQL (Document) |
|---|---|---|
| Schema | Rigid, enforced | Flexible |
| Relationships | Strong FK constraints | Embed or reference manually |
| Transactions | Full ACID | Limited (MongoDB has multi-doc since 4.0) |
| Scalability | Vertical + read replicas | Horizontal sharding |
| Query complexity | Powerful (JOINs, CTEs) | Limited aggregations |
| Use cases | Financial, ERP, complex queries | Catalogs, user profiles, logs |
Redis — In-Memory Cache
jsimport { createClient } from 'redis'
const redis = createClient({ url: process.env.REDIS_URL })
await redis.connect()
// Strings — cache API responses
await redis.set('user:1', JSON.stringify(user), { EX: 3600 }) // TTL 1h
const cached = JSON.parse(await redis.get('user:1') ?? 'null')
// Cache-aside pattern
async function getUser(id) {
const key = `user:${id}`
const cached = await redis.get(key)
if (cached) return JSON.parse(cached)
const user = await db.findUser(id)
await redis.set(key, JSON.stringify(user), { EX: 3600 })
return user
}
// Invalidation
await redis.del('user:1')
await redis.del(...userKeys) // bulk delete
// Counters (atomic)
await redis.incr('api:calls:today')
await redis.incrBy('score:user:1', 10)
// Sets — unique members
await redis.sAdd('online_users', userId)
await redis.sRem('online_users', userId)
await redis.sMembers('online_users')
// Sorted sets — leaderboards
await redis.zAdd('leaderboard', [{ score: 1500, value: 'user:1' }])
await redis.zRangeWithScores('leaderboard', 0, 9, { REV: true }) // top 10
// Lists — queues
await redis.lPush('job_queue', JSON.stringify(job)) // enqueue
await redis.rPop('job_queue') // dequeue (FIFO)
// Hash — structured objects
await redis.hSet('session:abc', { userId: '1', role: 'admin' })
await redis.hGet('session:abc', 'userId')
await redis.hGetAll('session:abc')
await redis.expire('session:abc', 86400)Caching Strategies
Cache-aside (Lazy loading)
Read: check cache → miss → fetch DB → populate cache → return
Write: update DB → invalidate cache
Pro: Only cache what's requested, cache failures are recoverable
Con: First request always slow; potential stale data
Write-through
Write: update cache + DB together (synchronously)
Pro: Cache always fresh
Con: Every write goes to cache (wasted for rarely-read data)
Write-behind (Write-back)
Write: update cache → async batch write to DB
Pro: Low write latency
Con: Data loss risk if cache fails before flush
Read-through
Delegate cache population to the caching layer (cache fetches DB on miss)
Pro: Application code simpler
Con: Cache library must know about DBConnection Pooling
js// Postgres with pg
import { Pool } from 'pg'
const pool = new Pool({
connectionString: process.env.DATABASE_URL,
max: 20, // max connections in pool
idleTimeoutMillis: 30_000,
connectionTimeoutMillis: 2_000,
})
// Always release connections back to pool
const client = await pool.connect()
try {
const result = await client.query('SELECT * FROM users WHERE id = $1', [id])
return result.rows[0]
} finally {
client.release() // critical — always release
}
// Or use pool.query directly (auto-release)
const result = await pool.query('SELECT * FROM users WHERE id = $1', [id])
// With Prisma — pool is managed automatically
// Configure via connection string: ?connection_limit=20&pool_timeout=10Database Design Principles
Normalization (reduce redundancy):
1NF — atomic values, no repeating groups
2NF — no partial dependencies on composite key
3NF — no transitive dependencies (every non-key column depends only on PK)
Denormalization — intentional redundancy for read performance
- Store computed totals (order_total)
- Duplicate user name on posts for fast reads
- Use materialized views
N+1 Problem — avoid in ORMs
// ❌ N+1: 1 query for users + N queries for each user's posts
const users = await User.findAll()
for (const user of users) {
user.posts = await Post.findAll({ where: { userId: user.id } })
}
// ✅ Eager load with JOIN
const users = await prisma.user.findMany({ include: { posts: true } })[prev·next]