Build data systems that stay fast, correct, and scalable-no matter how quickly your business grows. Designing Data-Intensive Applications shows developers, data engineers, and architects how to turn messy, high-volume data into dependable, real-time products and insights. Instead of locking you into a single stack, this book teaches proven principles, patterns, and trade-offs you can apply with any modern tooling.
What you'll learn
Core data models and when to use them: relational, key-value, document, columnar, and graph
Storage engines and indexing trade-offs that impact write/read latency and cost
Replication, sharding, and rebalancing strategies that minimize hotspots and downtime
Consistency models (eventual, causal, strong) and how to choose them pragmatically
Transactions, idempotency, retries, and saga-style workflows that survive failures
Streaming vs. batch: designing pipelines, watermarks, windowing, and backpressure
Change Data Capture (CDC), event logs, and outbox patterns for reliable integration
Data contracts, schema evolution, and lineage to keep teams moving without breakage
Caching, queues, and rate-limiting patterns for smooth, predictable performance
Observability essentials: metrics, tracing, and alerting for data-rich systems in production
Who it's for
Data engineers and platform teams building pipelines and storage layers
Backend engineers and SREs who operate mission-critical services
Architects and tech leads who make consistency, latency, and cost trade-offs every day
Why this book
Technology-agnostic patterns you can apply with PostgreSQL, Kafka, Spark/Flink, cloud object stores, NoSQL databases, and more
Clear diagrams and step-by-step reasoning, so you understand not just how to do something-but why it works
Practical guidance for real systems: partial failures, retries, replays, and schema drift
By the end, you'll have a toolkit of mental models and implementation patterns to design systems that scale with your data-and your ambitions.