Designing Data-Intensive Applications

The Core Framework

Reliability: Systems must work correctly under adversity — hardware faults, software bugs, human error. Design for fault tolerance, not fault prevention.
Scalability: Describe load with parameters (requests/sec, fan-out ratio), measure performance with percentiles (p50, p99, p999), then choose scaling strategies.
Maintainability: Operability (easy to run), simplicity (manageable complexity via good abstractions), evolvability (easy to change).
Trade-offs are inescapable: Every choice involves tension — read vs. write speed, consistency vs. availability, correctness vs. performance.
Principles over tools: The same patterns (append-only logs, sorted merge, quorum voting) recur across storage, replication, batch, and stream processing.

Quick Lookup

Situation	Do This	Avoid This
Choosing a storage engine	Match to workload: LSM-trees for write-heavy, B-trees for read-heavy	Assuming one engine fits all workloads
Schema design	Use document model for self-contained documents; relational for highly interconnected data	Calling document DBs "schemaless" — it's schema-on-read
Replication strategy	Start with single-leader; use multi-leader only for multi-datacenter or offline clients	Multi-leader for single-datacenter (adds conflict complexity for little gain)
Transaction isolation	Know exactly what your DB provides; don't trust "ACID" labels	Assuming "serializable" means the same across databases
Distributed locks	Always use fencing tokens; never rely on timeouts alone	Trusting a lock lease without fencing — process pauses can outlast leases
Data integration	Use CDC with a single source of truth; derive all other views	Dual writes to multiple systems (race conditions, partial failures)
Performance measurement	Use percentiles (p99, p999), not averages	Arithmetic mean of response times (hides tail latency)
Handling failures	Design immutable inputs + replaceable outputs for recoverability	Mutable state that can't be rebuilt from source

The Key Insight

"Technology is a powerful force in our society... But they can also be used for good: to make underrepresented people's voices heard, to create opportunities for everyone, and to avert disasters." — Martin Kleppmann, Dedication

Designing Data-Intensive Applications

Overview

The Core Framework

Quick Lookup

The Key Insight

References

Batch Processing

Consistency and Consensus

Reliability, Scalability, Maintainability — The Three Pillars

Data Models — Relational, Document, and Graph

Distributed Systems Problems

Encoding Formats and Schema Evolution

Future Architecture - Unbundled Databases and End-to-End Correctness

Implementation Playbook for Data System Decisions

Partitioning

Replication

Rules of Thumb for Data-Intensive Applications

Storage Engines — LSM-Trees, B-Trees, and Analytical Storage

Stream Processing

Transactions