Complete System Design Interview Preparation For 2-7 Years of Experience [2026]

    Complete System Design Interview Preparation For 2-7 Years of Experience [2026]

    A complete system design reference for developers, covering scalability, databases, load balancers, caching, Kafka, CDN, microservices, and distributed systems. Built for engineers preparing for system design interviews or designing production-grade systems from scratch.

    default profile

    Rohan Saxena

    February 01, 2026

    19 min read

    Introduction#

    There is no prerequisite to read this one shot blog.

    If you know how to build a basic backend, understand what an API is, and have used a database at least once, you are good to go.

    This article is designed to be a single source of truth for system design fundamentals. You should be able to read this and then directly start attempting system design interview problems or designing real-world systems with confidence.

    Most system design content online focuses heavily on theory. This cheatsheet focuses on:

    • Practical intuition
    • Real-world usage
    • How things break at scale
    • Why certain design decisions exist

    Think of this as notes shared by a friend who has already made mistakes so you do not have to repeat all of them.

    What This One Shot Blog Covers?#

    This is a long read by design. Roughly 60 minutes if read properly.

    You will learn:

    • Why system design matters
    • What servers actually are
    • Latency vs throughput
    • Scaling fundamentals
    • Auto scaling
    • Back-of-the-envelope estimation

    Later parts will go much deeper into databases, caching, distributed systems, messaging, consistency, and interview strategy.


    Why Study System Design?#

    Most developers start by building systems like this:

    This architecture works perfectly for:

    • College projects
    • Hackathons
    • Early MVPs
    • Side projects with limited users

    Now imagine this same system when:

    • Thousands of users hit it simultaneously
    • Database queries increase massively
    • One server goes down
    • Traffic spikes unexpectedly

    Suddenly:

    • APIs slow down
    • Requests start failing
    • The database becomes the bottleneck
    • The system crashes at the worst possible time

    System design is about preparing for growth and failure.

    It answers questions like:

    • How do we scale without rewriting everything?
    • How do we handle failures gracefully?
    • How do we make systems reliable under load?
    • How do real companies handle millions of users?

    If you do not design for scale early, scale will force redesign later.

    What Is a Server?#

    A server is simply a computer running your application.

    There is nothing special about it.

    When you run a backend locally on:

    http://localhost:8080

    You are already running a server.

    Breaking it down#

    • localhost resolves to the IP address 127.0.0.1
    • This IP points to your own machine
    • 8080 is the port number
    • The port tells the operating system which application should receive the request

    Your laptop can run multiple applications at the same time. Ports help the OS route requests to the correct application.

    What happens when you visit a website?#

    When you type:

    https://example.com

    Behind the scenes:

    1. DNS resolves example.com to an IP address
    2. Your browser sends a request to that IP
    3. The request hits a server on a specific port
    4. The server routes it to the correct application
    5. The application processes the request and sends a response

    Remembering IP addresses is hard, so humans use domain names. Machines still talk using IP addresses.

    How Do We Deploy an Application?#

    Locally, your application runs on your laptop.

    To make it accessible to others:

    • You need a public IP address
    • You need a machine that stays online
    • You need to expose your application to the internet

    Managing physical servers is painful. That is why cloud providers exist.

    Cloud providers give you:

    • Virtual machines
    • Public IP addresses
    • Networking
    • Security tools

    In AWS, a virtual machine is called an EC2 instance.

    Uploading your application code from your laptop to a cloud server is called deployment.

    Latency and Throughput#

    These two terms appear everywhere in system design discussions. They measure different things and both matter.

    Latency#

    Latency is the time taken for a single request to go from the client to the server and back.

    It is usually measured in milliseconds.

    Examples:

    • API responds in 120 ms
    • Webpage loads in 300 ms

    Lower latency means:

    • Faster responses
    • Better user experience

    Higher latency means:

    • Slower apps
    • Frustrated users
    • More retries and refreshes

    Round Trip Time (RTT)#

    RTT is the total time taken for:

    • Request to reach the server
    • Response to come back

    RTT is often used interchangeably with latency.

    Throughput#

    Throughput is the number of requests a system can handle per second.

    It is measured as:

    • Requests per second (RPS)
    • Transactions per second (TPS)

    A system can have:

    • Low latency but low throughput
    • High throughput but high latency

    The ideal system has:

    • Low latency
    • High throughput

    In reality, you usually balance between the two.

    Simple Analogy#

    ConceptReal World Example
    LatencyTime taken by one car to travel a road
    ThroughputNumber of cars that can pass per hour

    Scaling Basics#

    Scaling means handling more load.

    Load can be:

    • More users
    • More requests
    • More data

    There are only two ways to scale a system.

    Vertical Scaling#

    Vertical scaling means increasing the power of a single machine.

    Examples:

    • More CPU cores
    • More RAM
    • Larger disk

    Pros#

    • Simple to implement
    • No architectural changes required

    Cons#

    • Has a hard upper limit
    • Expensive
    • Single point of failure

    Vertical scaling is usually the first step because it is easy.

    Horizontal Scaling#

    Horizontal scaling means adding more machines and distributing the load.

    Instead of one powerful server:

    • Use many smaller servers
    • Spread traffic evenly

    Clients cannot decide which server to talk to. That is where load balancers come in.

    Load Balancer (High Level)#

    A load balancer:

    • Acts as a single entry point
    • Receives all incoming requests
    • Forwards them to backend servers

    Clients only know the load balancer. Servers can be added or removed freely.

    Horizontal scaling is the most common approach in real-world systems.

    Auto Scaling#

    Traffic is rarely constant.

    Some days:

    • Low usage

    Some days:

    • Traffic spikes suddenly

    Running maximum servers all the time is expensive and wasteful.

    Auto scaling solves this.

    How Auto Scaling Works#

    • Monitor server metrics like CPU or memory
    • Define thresholds
    • Add servers when load increases
    • Remove servers when load decreases

    This allows:

    • Cost efficiency
    • High availability
    • Automatic recovery

    Auto scaling should always be combined with monitoring and alerts.

    Back-of-the-Envelope Estimation#

    In system design interviews, you are expected to estimate scale.

    Exact numbers are not required. Reasonable approximations are.

    Spend around 5 minutes on this in interviews.

    Memory and Storage Reference Table#

    UnitApproximate SizePower of 10
    1 KB1,000 bytes10³
    1 MB1,000,000 bytes10⁶
    1 GB1,000,000,000 bytes10⁹
    1 TB1,000,000,000,000 bytes10¹²

     

    What We Usually Estimate#

    1. Load estimation
    2. Storage estimation
    3. Resource estimation

    Example: Load Estimation#

    Assume:

    • 50 million daily active users
    • Each user performs 10 actions per day

    Total actions per day:

    50,000,000 x 10 = 500,000,000 actions/day

    Approximate requests per second:

    500,000,000 / 86,400 ≈ 5,800 requests/sec

    Example: Storage Estimation#

    Assume:

    • Each record is 1 KB
    • 500 million records per day

    Daily storage required:

    500 million x 1 KB = 500 GB per day

    Monthly storage:

    500 GB x 30 ≈ 15 TB

    Example: Resource Estimation#

    Assume:

    • 6,000 requests per second
    • Each request takes 10 ms of CPU time

    Total CPU time per second:

    6,000 x 10 ms = 60,000 ms

    If one CPU core handles 1,000 ms per second:

    60,000 / 1,000 = 60 CPU cores

    If one server has 4 cores:

    60 / 4 = 15 servers

    CAP Theorem#

    CAP theorem is one of the most important ideas in system design.

    It explains why distributed systems are forced to make trade-offs.

    CAP stands for:

    • Consistency
    • Availability
    • Partition Tolerance

    This discussion always assumes we are talking about distributed systems.

    What Is a Distributed System?#

    A distributed system is one where:

    • Data is stored across multiple machines
    • Requests are served by multiple nodes
    • Systems communicate over a network

    We use distributed systems because:

    • One machine is not enough at scale
    • We want fault tolerance
    • We want lower latency by serving users from nearby locations

    Each individual machine in a distributed system is called a node.

    The Three Properties Explained#

    Consistency#

    Consistency means:

    • Every read returns the most recent write
    • All nodes see the same data at the same time

    If data is updated on one node, that update must be reflected on all nodes before reads are allowed.

    Example:

    • User updates their profile name
    • Any request from any node should show the updated name immediately

    Availability#

    Availability means:

    • Every request receives a response
    • The system continues to function even if some nodes fail

    The response does not need to be correct or latest, but the system should not refuse requests.

    Example:

    • One database replica crashes
    • Other replicas continue serving traffic

    Partition Tolerance#

    Partition tolerance means:

    • The system continues to operate even if network communication breaks between nodes

    Network failures are unavoidable in real systems. Because of this, partition tolerance is mandatory in distributed systems.


    The Core Idea of CAP Theorem#

    In a distributed system, you can only guarantee two out of three properties at the same time.

    Possible combinations:

    • CP (Consistency + Partition Tolerance)
    • AP (Availability + Partition Tolerance)

    Impossible combination:

    • CAP (all three at once)

    Why Can We Not Have All Three?#

    Assume three nodes:

    • Node A
    • Node B
    • Node C

    Now assume a network partition occurs:

    • Node B loses connection with A and C

    If we choose Availability:#

    • Node B continues serving requests
    • Node A and C continue serving requests
    • Data may become inconsistent

    Result:

    • Availability achieved
    • Consistency sacrificed

    If we choose Consistency:#

    • Stop serving requests until network is restored
    • Prevent inconsistent writes

    Result:

    • Consistency achieved
    • Availability sacrificed

    There is no way to avoid this trade-off.


    Why Not Choose CA?#

    CA assumes:

    • No network failures

    In real-world distributed systems:

    • Network partitions will happen
    • Latency spikes will happen
    • Nodes will temporarily disconnect

    That is why real systems always choose between CP or AP.


    When to Choose CP vs AP#

    Choose CP When:#

    • Data correctness is critical
    • Inconsistent data is unacceptable

    Examples:

    • Banking systems
    • Payment systems
    • Stock trading platforms
    • Account balances

    Choose AP When:#

    • High availability is more important than strict consistency
    • Slightly stale data is acceptable

    Examples:

    • Social media feeds
    • Like counts
    • Comments
    • View counts

    Database Scaling#

    Initially, applications use a single database server.

    As traffic increases:

    • Queries slow down
    • CPU usage increases
    • Disk IO becomes a bottleneck

    We scale databases step by step, not all at once.

    Over-engineering early is just as bad as under-engineering.


    Indexing#

    Without indexes:

    • Database scans every row
    • Time complexity is O(N)

    With indexes:

    • Database uses a data structure like B-trees
    • Time complexity becomes O(log N)

    Indexes are created on columns that are frequently queried.

    Key Points About Indexing#

    • Improves read performance
    • Slightly slows down writes
    • Uses extra storage
    • Should be added thoughtfully

    You rarely regret adding the right index.


    Partitioning#

    Partitioning means:

    • Breaking a large table into smaller tables
    • All partitions live on the same database server

    Each partition has:

    • Its own index
    • Smaller data size
    • Faster queries

    The database engine decides which partition to query.

    Partitioning improves performance without changing application logic.


    Master Slave Architecture#

    Used when:

    • Reads heavily outnumber writes
    • One database cannot handle read traffic

    How It Works#

    • One master handles all writes
    • Multiple slaves handle reads
    • Data is replicated from master to slaves

    Reads are distributed. Writes remain centralized.


    Multi-Master Architecture#

    Used when:

    • Write traffic becomes very high
    • One master cannot handle all writes

    Multiple masters:

    • Handle writes independently
    • Periodically sync data

    Challenges#

    • Write conflicts
    • Data reconciliation
    • Complex business logic

    Multi-master setups are powerful but risky.


    Database Sharding#

    Sharding means:

    • Splitting data across multiple database servers
    • Each server holds a subset of the data

    Each server is called a shard.

    Sharding is done using a sharding key.

     


    Sharding Strategies#

    Range-Based Sharding#

    Data split by ranges.

    Example:

    • Users 1 to 1,000 on shard 1
    • Users 1,001 to 2,000 on shard 2

    Pros:

    • Simple logic

    Cons:

    • Uneven load
    • Hot shards

    Hash-Based Sharding#

    Shard determined by hash function.

    Example: hash(user_id) % number_of_shards

    Pros:

    • Even data distribution

    Cons:

    • Hard to rebalance
    • Adding shards is complex

    Geographic Sharding#

    Data split by region.

    Example:

    • Asia users on one shard
    • Europe users on another shard

    Pros:

    • Low latency

    Cons:

    • Traffic imbalance

    Directory-Based Sharding#

    A lookup table maps keys to shards.

    Pros:

    • Flexible
    • Easy to reassign shards

    Cons:

    • Lookup table becomes a bottleneck

    Disadvantages of Sharding#

    • Complex application logic
    • Expensive joins
    • Hard consistency guarantees
    • Difficult rebalancing

    Avoid sharding until absolutely necessary.


    Summary of Database Scaling#

    Follow this order:

    1. Vertical scaling
    2. Indexing
    3. Partitioning
    4. Read replicas
    5. Sharding

    Scale only when needed.


    SQL vs NoSQL Databases#

    Choosing the right database matters more than choosing the popular one.


    SQL Databases#

    Characteristics:

    • Structured schema
    • Tables and rows
    • Strong consistency
    • ACID guarantees

    Examples:

    • MySQL
    • PostgreSQL
    • Oracle

    Use SQL when:

    • Data integrity matters
    • Complex joins are required
    • Transactions are critical

    NoSQL Databases#

    Characteristics:

    • Flexible schema
    • Horizontal scaling
    • High availability
    • Eventual consistency

    Types:

    TypeExampleUse Case
    DocumentMongoDBJSON-like data
    Key-ValueRedisCaching
    ColumnCassandraLarge-scale writes
    GraphNeo4jRelationship queries

    Scaling Differences#

    FeatureSQLNoSQL
    ScalingVerticalHorizontal
    SchemaFixedFlexible
    ConsistencyStrongEventual
    JoinsSupportedLimited

    When to Use Which#

    Use SQL when:

    • Financial data
    • User accounts
    • Orders and payments

    Use NoSQL when:

    • High traffic
    • Large datasets
    • Real-time systems
    • Low latency requirements

    Many production systems use both together.


    End of Part 2#

    In this part, you learned:

    • CAP theorem
    • Distributed system trade-offs
    • Database scaling strategies
    • SQL vs NoSQL decision making

    Next part will cover:

    • Microservices
    • Load balancer deep dive
    • Caching and Redis

    Microservices#

    Before microservices became popular, most applications were built as monoliths.

    Understanding the difference is critical.


    Monolith Architecture#

    In a monolith:

    • Entire backend is one single application
    • All features live in the same codebase
    • Deployed as one unit

    Example for an e-commerce app:

    • User management
    • Product listing
    • Orders
    • Payments

    All of this exists in one backend service.

    Advantages of Monolith#

    • Simple to build initially
    • Easy to debug at small scale
    • Faster development for small teams

    Disadvantages of Monolith#

    • Hard to scale individual features
    • One crash can bring down everything
    • Codebase becomes messy over time
    • Deployments become risky

    Most startups begin with a monolith. That is normal and often the right choice.


    Microservice Architecture#

    Microservices break a large application into small, independent services.

    Each service:

    • Has its own codebase
    • Has its own database (usually)
    • Can be deployed independently

    Example microservices for e-commerce:

    • User Service
    • Product Service
    • Order Service
    • Payment Service

    Why Do We Use Microservices?#

    Independent Scaling#

    If Product Service gets more traffic:

    • Scale only Product Service
    • No need to scale entire system

    Independent Deployments#

    • Deploy Order Service without touching Payment Service
    • Fewer production risks

    Fault Isolation#

    • Payment Service crash does not kill User Service
    • System degrades gracefully

    Technology Flexibility#

    • One service in Node.js
    • Another in Go
    • Another in Java

    Each team can choose what fits best.


    When Should You Use Microservices?#

    Do not start with microservices blindly.

    Microservices make sense when:

    • Team size increases
    • Multiple teams work independently
    • Application grows large
    • Deployment frequency increases
    • You want to avoid single point of failure

    Small teams with early-stage products should usually start with a monolith.


    How Clients Communicate in Microservices#

    Each microservice runs on:

    • Different machines
    • Different IP addresses
    • Different ports

    Clients should not manage this complexity.

    So we introduce an API Gateway.


    API Gateway#

    API Gateway:

    • Acts as a single entry point
    • Routes requests to correct microservices
    • Hides internal architecture from clients

    Client: /login /products /orders

    Gateway:

    • Sends /login to User Service
    • Sends /products to Product Service
    • Sends /orders to Order Service

    Additional Responsibilities of API Gateway#

    • Authentication and authorization
    • Rate limiting
    • Request validation
    • Response aggregation
    • Caching

    API Gateway simplifies client logic and centralizes cross-cutting concerns.


    Load Balancer Deep Dive#

    Whenever you horizontally scale services, you need a load balancer.

    Clients should not know:

    • How many servers exist
    • Which server is healthy
    • Which server is busy

    Why Load Balancer Is Required#

    Without a load balancer:

    • Clients must know all server IPs
    • Clients must decide routing
    • Failures become client problems

    With a load balancer:

    • Single entry point
    • Automatic traffic distribution
    • Health checks
    • Failover handling

    Load Balancer Algorithms#

    Round Robin#

    Requests are sent to servers sequentially.

    Example:

    • Request 1 to Server A
    • Request 2 to Server B
    • Request 3 to Server C
    • Repeat

    Pros:

    • Simple
    • Fair distribution

    Cons:

    • Ignores server load
    • Assumes equal capacity

    Weighted Round Robin#

    Each server has a weight.

    Servers with higher weight:

    • Receive more requests

    Useful when:

    • Servers have different hardware capacity

    Least Connections#

    Traffic is routed to:

    • Server with fewest active connections

    Pros:

    • Dynamic
    • Adapts to real-time load

    Cons:

    • Not ideal when requests vary greatly in duration

    Hash-Based Routing#

    Routing based on:

    • Client IP
    • User ID
    • Session ID

    Ensures:

    • Same user hits same server

    Useful for:

    • Session persistence
    • Stateful applications

    Caching#

    Caching is one of the highest impact optimizations in system design.

    Caching stores:

    • Frequently accessed data
    • In fast storage
    • For quick retrieval

    Why Caching Is Needed#

    Database access is slow compared to memory.

    Example:

    • Database fetch takes 500 ms
    • Processing takes 100 ms
    • Total response time is 600 ms

    With caching:

    • Cached response served in 50 ms

    Huge improvement with minimal effort.


    Where Can We Cache?#

    Client Side Cache#

    • Browser cache
    • Mobile app cache

    Used for:

    • Static files
    • Images
    • Scripts

    Server Side Cache#

    • Stored on backend servers
    • Often in memory

    Examples:

    • Redis
    • Memcached

    CDN Cache#

    • Used for static content
    • Geographically distributed

    Examples:

    • Images
    • Videos
    • Stylesheets

    Application Level Cache#

    • Inside application code
    • Stores computed results
    • Avoids repeated computation

    Cache Invalidation#

    Caching is easy. Invalidation is hard.

    Whenever data changes:

    • Cached value must be updated or removed

    Common strategies:

    • Time based expiration
    • Manual eviction
    • Write-through cache

    Poor invalidation leads to stale data bugs.


    Redis Introduction#

    Redis is an in-memory data store.

    In-memory means:

    • Data stored in RAM
    • Extremely fast read and write

    Redis is commonly used for:

    • Caching
    • Rate limiting
    • Pub-sub
    • Session storage

    Why Not Store Everything in Redis?#

    RAM is expensive and limited.

    Databases:

    • Store data on disk
    • Handle large volumes
    • Persist data reliably

    Redis:

    • Fast but volatile
    • Data may be lost if not persisted

    Redis complements databases. It does not replace them.


    Redis Data Model#

    Redis stores data as key-value pairs.

    Key naming convention: user:1 user:1:email product:45

    Colon separation helps readability and grouping.


    Common Redis Data Types#

    String#

    • Simple key-value
    • Most commonly used

    Commands:

    • SET
    • GET
    • MGET

    List#

    Ordered collection.

    Commands:

    • LPUSH
    • RPUSH
    • LPOP
    • RPOP

    Use cases:

    • Queues
    • Stacks

    Set#

    • Unique values
    • No duplicates

    Use cases:

    • Unique visitors
    • Tags

    Hash#

    • Key-value pairs inside a key
    • Similar to an object

    Use cases:

    • User profiles
    • Configurations

    Redis TTL (Time To Live)#

    Keys can have expiration time.

    Example:

    • Cache valid for 24 hours
    • Automatically deleted after expiry

    TTL helps:

    • Avoid stale data
    • Control memory usage

    Cache Hit and Cache Miss#

    • Cache hit: Data found in cache
    • Cache miss: Data not found in cache

    Flow:

    1. Check cache
    2. If hit, return data
    3. If miss, fetch from database
    4. Store in cache
    5. Return response

    Write-Through vs Read-Through Cache#

    Read-Through#

    • Cache populated on read
    • Cache miss triggers database read

    Write-Through#

    • Data written to cache and database together
    • Ensures cache consistency

    Choice depends on use case.


    End of Part 3#

    In this part, you learned:

    • Monolith vs microservices
    • API gateway
    • Load balancer internals
    • Caching strategies
    • Redis fundamentals

    Next part will cover:

    • Blob storage
    • CDN
    • Message brokers
    • Kafka
    • Pub-sub systems

    Blob Storage#

    Databases are great at storing:

    • Text
    • Numbers
    • Structured records

    Databases are terrible at storing:

    • Images
    • Videos
    • PDFs
    • Audio files

    Trying to store large binary files inside databases leads to:

    • Slow queries
    • Huge backups
    • Scaling nightmares

    That is why blob storage exists.


    What Is a Blob?#

    Blob stands for Binary Large Object.

    Any file such as:

    • Image
    • Video
    • PDF
    • ZIP
    • Audio

    Can be represented as a sequence of binary data. That binary data is called a blob.


    Why We Do Not Store Blobs in Databases#

    Problems with storing blobs in databases:

    • Large row sizes
    • Slow reads and writes
    • Difficult scaling
    • Expensive storage

    Databases are optimized for structured data, not large files.


    Blob Storage Services#

    Blob storage is usually a managed service.

    Examples:

    • AWS S3
    • Cloudflare R2
    • Google Cloud Storage
    • Azure Blob Storage

    These services handle:

    • Scaling
    • Durability
    • Replication
    • Availability

    You treat them like a black box.


    How Blob Storage Is Used#

    Typical flow:

    1. Client uploads file
    2. Backend generates upload credentials
    3. File stored in blob storage
    4. Database stores file URL or metadata
    5. Client accesses file via URL

    Database stores references, not the file itself.


    AWS S3 Basics#

    You can think of S3 like Google Drive for applications.

    Key properties:

    • Extremely cheap storage
    • Virtually unlimited size
    • High durability
    • Global availability

    S3 is ideal for:

    • Images
    • Videos
    • Static website files
    • Backups
    • Logs

    Important S3 Concepts#

    ConceptMeaning
    BucketContainer for files
    ObjectA file stored in S3
    KeyPath of the file
    RegionPhysical location

    S3 provides durability close to 100 percent by replicating data internally.


    Content Delivery Network (CDN)#

    Serving files directly from blob storage works. Serving them fast worldwide does not.

    CDN solves this problem.


    What Is a CDN?#

    A CDN is a network of servers distributed across the world.

    These servers:

    • Cache static content
    • Serve users from nearby locations
    • Reduce latency

    Examples:

    • AWS CloudFront
    • Cloudflare CDN
    • Akamai

    Why CDN Is Needed#

    Assume:

    • Files stored in India
    • User requests from USA

    Without CDN:

    • High latency
    • Slow loading
    • Poor experience

    With CDN:

    • File cached near user
    • Faster response
    • Reduced load on origin server

    How CDN Works#

    1. User requests a file
    2. Request goes to nearest edge server
    3. If file exists in cache, return it
    4. If not, fetch from origin
    5. Cache it at edge
    6. Serve future requests locally


    CDN Terminology#

    TermDescription
    Edge ServerCDN server near user
    Origin ServerOriginal source like S3
    TTLCache expiry duration
    Cache HitFile served from CDN
    Cache MissFile fetched from origin

    Message Broker#

    Not all tasks should be handled synchronously.

    Some tasks:

    • Take too long
    • Are non-critical
    • Should not block user response

    That is where message brokers are used.


    Synchronous vs Asynchronous Processing#

    Synchronous#

    • Client waits for response
    • Used for quick operations
    • Risk of timeouts for long tasks

    Asynchronous#

    • Client gets immediate acknowledgement
    • Task processed in background
    • User notified later if needed

    Why Message Brokers Are Used#

    Message brokers help with:

    • Decoupling services
    • Reliability
    • Retry mechanisms
    • Load buffering

    Producer and consumer do not need to know about each other.


    Message Broker Components#

    RoleDescription
    ProducerSends messages
    BrokerStores messages
    ConsumerProcesses messages

    Types of Message Brokers#

    There are two major types:

    1. Message Queues
    2. Message Streams

    They solve different problems.


    Message Queue#

    In a message queue:

    • One message is processed by one consumer
    • Message is deleted after processing

    Examples:

    • RabbitMQ
    • AWS SQS

    When to Use Message Queues#

    Use message queues when:

    • One consumer should handle a task
    • Task must not be duplicated
    • Order matters less

    Examples:

    • Email sending
    • PDF generation
    • Image resizing

    Message Stream#

    In a message stream:

    • One message can be consumed by many consumers
    • Messages are not deleted immediately
    • Consumers track their own progress

    Examples:

    • Apache Kafka
    • AWS Kinesis

    When to Use Message Streams#

    Use message streams when:

    • Same event must trigger multiple actions
    • High throughput is required
    • Event history is valuable

    Examples:

    • Activity logs
    • Analytics
    • Event sourcing

    Apache Kafka Overview#

    Kafka is a distributed message streaming platform.

    It is designed for:

    • High throughput
    • Fault tolerance
    • Scalability

    Kafka can handle millions of messages per second.


    Kafka Core Concepts#

    ConceptMeaning
    ProducerSends messages
    ConsumerReads messages
    BrokerKafka server
    TopicMessage category
    PartitionSubdivision of topic


    Kafka Topics and Partitions#

    Topics are split into partitions.

    Partitions allow:

    • Parallel processing
    • Horizontal scaling

    Each partition:

    • Is ordered
    • Can be processed by only one consumer per group

    Consumer Groups#

    Consumers belong to groups.

    Rules:

    • One partition assigned to one consumer per group
    • Multiple groups can read same topic
    • Enables parallelism

    Kafka Rebalancing#

    When:

    • Consumer joins
    • Consumer leaves
    • Partition count changes

    Kafka automatically reassigns partitions. No manual intervention required.


    Kafka Use Case Example#

    Assume:

    • Ride-sharing app
    • Driver location updated every 2 seconds

    Instead of writing directly to database:

    • Write updates to Kafka
    • Batch write to database later

    This reduces database load drastically.


    Realtime PubSub#

    PubSub stands for Publish Subscribe.

    In PubSub:

    • Messages are pushed immediately
    • No storage by default
    • Real-time delivery

    PubSub vs Message Broker#

    FeatureMessage BrokerPubSub
    StorageYesNo
    DeliveryPull basedPush based
    LatencyMediumVery low
    Use CaseBackground jobsRealtime updates

    Redis PubSub#

    Redis supports PubSub.

    Used for:

    • Realtime chat
    • Notifications
    • Live updates

    PubSub Example#

    In a chat application:

    • Clients connected to different servers
    • Messages published to Redis channel
    • All subscribed servers receive message
    • Servers forward message to connected clients

    This enables realtime communication across servers.


    Limitations of PubSub#

    • No message persistence
    • If consumer is offline, message is lost
    • Not suitable for critical workflows

    PubSub is for speed, not reliability.


    End of Part 4#

    In this part, you learned:

    • Blob storage and S3
    • CDN concepts
    • Message brokers
    • Message queues vs streams
    • Kafka internals
    • Realtime PubSub

    Next part will cover:

    • Event-driven architecture
    • Distributed systems
    • Leader election
    • Consistency deep dive
    • Consistent hashing
    • Data redundancy
    • Proxy
    • How to solve any system design problem

    Event-Driven Architecture (EDA)#

    Event-driven architecture is a way of designing systems where:

    • Services react to events
    • Components are loosely coupled
    • Workflows are asynchronous by default

    Instead of calling services directly, systems communicate by emitting events.


    Why Do We Need Event-Driven Architecture?#

    Consider an e-commerce application.

    When an order is placed:

    • Order service creates the order
    • Payment service processes payment
    • Inventory service updates stock
    • Notification service sends confirmation
    • Analytics service records the event

    If these are synchronous calls:

    • One failure breaks everything
    • Latency increases
    • Tight coupling forms

    EDA solves this.


    How EDA Works#

    1. An event occurs
    2. Event is published to an event bus
    3. Multiple consumers listen to the event
    4. Each consumer reacts independently

    The producer does not know who consumes the event.


    Event Notification vs Event-Carried State#

    Event Notification#

    Event contains minimal information.

    Example: order_created { order_id }

    Consumers fetch required data separately.

    Pros:

    • Lightweight events

    Cons:

    • Extra database calls

    Event-Carried State Transfer#

    Event carries full data.

    Example: order_created { order_id, user_id, amount, items }

    Pros:

    • No extra fetch required

    Cons:

    • Larger events
    • Schema evolution complexity

    Distributed Systems#

    A distributed system consists of:

    • Multiple nodes
    • Network communication
    • Shared responsibility

    Distributed systems exist because:

    • One machine is not enough
    • We need fault tolerance
    • We want scalability

    Problems in Distributed Systems#

    Distributed systems introduce challenges:

    • Partial failures
    • Network latency
    • Clock synchronization
    • Data consistency

    Designing distributed systems is about handling failures gracefully.


    Leader Election#

    In many distributed systems:

    • One node must act as leader
    • Leader coordinates work
    • Followers execute tasks

    Leader election decides:

    • Which node becomes leader
    • How leadership changes on failure

    Why Leader Election Is Needed#

    Examples:

    • Distributed locks
    • Scheduled jobs
    • Master coordination
    • Metadata management

    Without a leader:

    • Duplicate work
    • Conflicts
    • Inconsistent state

    How Leader Election Works (High Level)#

    Common approaches:

    • Using distributed coordination systems
    • Heartbeats to detect failure
    • Automatic re-election

    Popular tools:

    • ZooKeeper
    • etcd
    • Consul

    When leader fails:

    • New leader is elected automatically
    • System recovers without downtime

    Consistency Deep Dive#

    Consistency defines how up-to-date data appears across nodes.

    There are multiple consistency models.


    Strong Consistency#

    Strong consistency guarantees:

    • Reads always return latest write
    • No stale data

    Used when correctness matters.

    Examples:

    • Banking balances
    • Financial ledgers
    • Inventory counts for purchases

    Trade-off:

    • Higher latency
    • Lower availability during failures

    Eventual Consistency#

    Eventual consistency guarantees:

    • Data will become consistent over time
    • Temporary inconsistency is allowed

    Used when availability matters more.

    Examples:

    • Social media likes
    • Feed updates
    • View counts

    When to Choose Which#

    Use CaseConsistency Type
    PaymentsStrong
    OrdersStrong
    Social feedsEventual
    AnalyticsEventual

    Achieving Strong Consistency#

    Common techniques:

    • Distributed locks
    • Synchronous replication
    • Two-phase commit
    • Leader-based writes

    These approaches reduce availability but ensure correctness.


    Achieving Eventual Consistency#

    Common techniques:

    • Asynchronous replication
    • Conflict resolution
    • Versioning
    • Last-write-wins strategy

    Eventual consistency trades accuracy for speed and availability.


    Consistent Hashing#

    Consistent hashing is used to:

    • Distribute data evenly
    • Reduce data movement when nodes change

    Used in:

    • Caching systems
    • Load balancers
    • Distributed databases

    Problem With Normal Hashing#

    Normal hashing: hash(key) % number_of_nodes

    When nodes change:

    • Almost all keys are remapped
    • Cache becomes useless

    How Consistent Hashing Solves This#

    • Nodes placed on a hash ring
    • Keys mapped to nearest node clockwise
    • Only a small subset of keys move when nodes change

    This makes scaling efficient.


    Virtual Nodes#

    To avoid uneven distribution:

    • Each physical node has multiple virtual nodes
    • Improves balance
    • Reduces hotspots

    Most real implementations use virtual nodes.


    Data Redundancy and Data Recovery#

    Failures are inevitable.

    Disks fail. Servers crash. Regions go down.

    Data redundancy protects against data loss.


    Why Data Redundancy Is Needed#

    • Hardware failure
    • Human error
    • Natural disasters
    • Software bugs

    If data exists only in one place, it will be lost eventually.


    Common Redundancy Techniques#

    Replication#

    • Multiple copies of data
    • Stored across nodes or regions

    Backups#

    • Periodic snapshots
    • Stored separately
    • Used for recovery

    Continuous Backup#

    • Near real-time replication
    • Minimal data loss

    Recovery Strategies#

    StrategyPurpose
    Point-in-time recoveryRestore to specific moment
    FailoverSwitch to replica
    Disaster recoveryRegion-level failure

    Always test backups. Untested backups do not exist.


    Proxy#

    A proxy acts as an intermediary between:

    • Client and server
    • Server and backend services

    Forward Proxy#

    Used on client side.

    Examples:

    • Corporate proxies
    • VPNs
    • Content filtering

    Client knows proxy exists.


    Reverse Proxy#

    Used on server side.

    Examples:

    • Nginx
    • HAProxy
    • Cloudflare

    Client does not know backend servers.


    Why Reverse Proxies Are Used#

    • Load balancing
    • Security
    • SSL termination
    • Caching
    • Request routing

    Reverse proxies are foundational in modern architectures.


    How to Solve Any System Design Problem#

    This is the most important section.


    Step 1: Clarify Requirements#

    Ask questions:

    • Scale
    • Read vs write ratio
    • Latency expectations
    • Consistency needs

    Never assume.


    Step 2: Estimate Scale#

    • Users
    • Requests per second
    • Storage growth

    Use back-of-the-envelope estimation.


    Step 3: High-Level Design#

    • Clients
    • APIs
    • Databases
    • Caches
    • Message brokers

    Draw boxes first, not tables.


    Step 4: Identify Bottlenecks#

    Ask:

    • What fails first?
    • What does not scale?
    • What is expensive?

    Step 5: Apply Scaling Techniques#

    • Caching
    • Replication
    • Sharding
    • Load balancing
    • Async processing

    Step 6: Discuss Trade-offs#

    Every design has trade-offs.

    Talk about:

    • Consistency vs availability
    • Cost vs performance
    • Simplicity vs scalability

    Interviewers care about reasoning, not perfection.


    Final Thoughts#

    System design is not about drawing fancy diagrams.

    It is about:

    • Thinking clearly
    • Making reasonable trade-offs
    • Designing for failure
    • Keeping systems simple

    Everyone learns system design by building systems that break.

    Build. Observe. Fix. Repeat.

    Want to Master Spring Boot and Land Your Dream Job?

    Struggling with coding interviews? Learn Data Structures & Algorithms (DSA) with our expert-led course. Build strong problem-solving skills, write optimized code, and crack top tech interviews with ease

    Learn more
    system design
    system design interview
    distributed systems
    scalability
    microservices
    load balancer
    caching
    Redis
    Apache Kafka
    CAP theorem
    database sharding
    CDN
    API gateway
    horizontal scaling
    vertical scaling
    auto scaling
    Was it helpful?

    Subscribe to our newsletter

    Read articles from Coding Shuttle directly inside your inbox. Subscribe to the newsletter, and don't miss out.

    More articles