Library
Designing Data-Intensive Applications · 14 of 14
Designing Data-Intensive Applications
AI Software Development CRITICAL

Transactions

transactions ACID isolation-levels write-skew serializability

Key Principle

Transactions group reads and writes into a unit that fully commits or fully aborts, simplifying the programming model by absorbing concurrency and fault-handling complexity. But "ACID compliant" is a marketing term with no guaranteed specific behavior (Ch. 7). Isolation levels form a hierarchy -- read committed, snapshot isolation, serializability -- each preventing additional anomalies at increasing performance cost. The practical consequence: weak isolation silently shifts correctness responsibility to the application developer.

Why This Matters

Weak isolation bugs have caused substantial financial losses, regulatory investigations, and data corruption. "Use an ACID database if you're handling financial data!" misses the point -- even popular relational systems use weak isolation by default (Ch. 7). Developers must understand which anomalies their chosen isolation level leaves unhandled rather than trusting labels. The SQL standard predates snapshot isolation, so vendors map levels inconsistently: PostgreSQL and MySQL call snapshot isolation "repeatable read," Oracle calls it "serializable," IBM DB2 means something else entirely. "As a result, nobody really knows what repeatable read means" (Ch. 7).

Good Examples

  • Write skew -- on-call doctors: Two doctors simultaneously check that at least two are on call, then each drops off. Neither transaction touches the same row, so snapshot isolation detects no conflict, but the invariant (minimum staffing) is violated. The same pattern applies to room bookings, username claims, and double-spending (Ch. 7).
  • SSI in PostgreSQL 9.1: Serializable Snapshot Isolation layers conflict detection on top of snapshot isolation. It reuses index-range lock infrastructure from 2PL but as non-blocking "tripwires" -- notifying transactions that their read premises may be stale rather than blocking them. Read-only queries need no locks, and abort decisions are deferred to commit time (Ch. 7).
  • VoltDB serial execution: Runs all transactions on one thread, achieving serializability by construction. Feasible because RAM became cheap enough to keep active data in memory and OLTP transactions are inherently short. But cross-partition transactions collapse throughput to ~1,000 writes/sec (Ch. 7).

Counterpoints

  • Conflating atomicity with concurrency: ACID atomicity means abortability (on fault, all writes are discarded), not multi-threaded atomicity. The "A" is about safe retries, not isolation (Ch. 7).
  • Trusting single-object atomicity as "transactions": Storage engines universally provide single-object atomicity via WAL and per-object locks. Vendors marketing this as "lightweight transactions" is misleading -- real transactions group multiple operations on multiple objects (Ch. 7).
  • Retrying aborted transactions naively: Five failure modes undermine retries: duplicate execution when commit succeeded but ack was lost, overload amplification without backoff, retrying permanent errors, non-transactional side effects (sending email), and client crash losing in-flight data. Many ORMs (Rails, Django) do not retry at all (Ch. 7).

Key Quotes

"Transactions are not a law of nature; they were created with a purpose, namely to simplify the programming model." -- Martin Kleppmann, Chapter 7

"Rather than blindly relying on tools, we need to develop a good understanding of the kinds of concurrency problems that exist, and how to prevent them." -- Martin Kleppmann, Chapter 7

"Writers don't just block other writers; they also block readers and vice versa." -- Martin Kleppmann, Chapter 7 (on 2PL)

Rules of Thumb

  • Verify your database's actual isolation behavior; do not trust the label
  • Use serializable isolation (SSI, 2PL, or serial execution) when write skew would violate business invariants
  • Prefer atomic write operations (UPDATE ... SET value = value + 1) over read-modify-write cycles when expressible
  • Use SELECT FOR UPDATE only when lockable rows exist; it fails on phantoms (no rows to lock)
  • Keep read-write transactions short under SSI to minimize abort rates; long read-only transactions are fine
  • In replicated databases, use commutative operations or CRDTs instead of locks -- locks assume a single authoritative copy
  • Treat materializing conflicts as a last resort; it leaks concurrency control into the data model

Related References

  • Replication - transactions remedy consistency anomalies from replication lag; LWW in replicated DBs silently loses updates
  • Partitioning - serial execution converts concurrency into a partitioning problem; cross-partition transactions are expensive
  • Distributed Systems Problems - partial failures and clock unreliability complicate distributed transactions; 2PC is distinct from 2PL