Load Balancing in System Design Explained with Real-World Examples

Load Balancing is one of the most important concepts in System Design interviews and real-world applications.

If you’ve ever wondered:

  • What is a Load Balancer?
  • Why do we need Load Balancing?
  • What problem does it solve?
  • What happens without it?

This guide will explain everything in simple language.


Imagine a Restaurant

Suppose there is only one cashier at a restaurant.

Customer 1 → Cashier
Customer 2 → Waiting
Customer 3 → Waiting
Customer 4 → Waiting

As more customers arrive:

  • Queue becomes longer
  • Waiting time increases
  • Cashier becomes overloaded

Now imagine the restaurant adds 5 cashiers.

Customer 1 → Cashier 1
Customer 2 → Cashier 2
Customer 3 → Cashier 3
Customer 4 → Cashier 4
Customer 5 → Cashier 5

Everyone gets served faster.

This is exactly what Load Balancing does.


What is Load Balancing?

Load Balancing is:

The process of distributing incoming traffic across multiple servers instead of sending everything to a single server.

Instead of:

Users
Server 1

We do:

   Users  
     ↓
Load Balancer
     ↓ 
┌─────────┐ 
↓    ↓    ↓
S1   S2   S3

The Load Balancer decides which server should handle each request.


Why Do We Need Load Balancing?

Imagine your website gets:

10 users per day

One server may be enough.

But suddenly:

100,000 users visit

One server might:

  • Become slow
  • Crash
  • Stop responding

Load balancing prevents this problem.


What Happens Without Load Balancing?

Without load balancing:

Users
Server 1

Problems:

❌ Server Overload

Too many requests hit one server.


❌ Slow Response Time

Users wait longer.


❌ Single Point of Failure

If server crashes:

Server Down = Application Down

Entire application becomes unavailable.


❌ Poor Scalability

Cannot handle traffic growth efficiently.


Benefits of Load Balancing

1. Better Performance

Traffic is distributed evenly.

1000 Requests
333 → Server 1
333 → Server 2
334 → Server 3

No single server becomes overloaded.


2. High Availability

If one server fails:

Server 1 ❌
Load Balancer
Server 2
Server 3

Users are automatically redirected.

Application remains available.


3. Scalability

Need more capacity?

Simply add more servers.

S1
S2
S3
S4
S5

Load balancer starts using them automatically.


4. Better User Experience

Users get:

  • Faster pages
  • Better reliability
  • Less downtime

Real-World Example

Imagine:

Amazon Sale

Millions of users visit simultaneously.

Without load balancing:

One Server
Crash

With load balancing:

  Users  
    ↓
Load Balancer
    ↓
100+ Servers

Traffic is distributed safely.


How Does a Load Balancer Work?

Step-by-step:

Step 1

User opens:

www.example.com

Step 2

Request reaches:

Load Balancer

Step 3

Load Balancer checks:

  • Which server is free?
  • Which server is healthy?
  • Which server has fewer requests?

Step 4

Request is forwarded.

  User 
   ↓
Load Balancer 
   ↓
Server 2

Step 5

Server responds.

Server 2
Load Balancer
User

Load Balancing Algorithms

A Load Balancer needs rules to decide where traffic goes.


1. Round Robin

Most common.

Requests are distributed one by one.

Request 1 → S1
Request 2 → S2
Request 3 → S3
Request 4 → S1
Request 5 → S2

2. Least Connections

Send traffic to the server with fewer active users.

Example:

S1 → 200 users
S2 → 50 users
S3 → 100 users

Next request goes to:

S2

because it has fewer connections.


3. Weighted Round Robin

Powerful servers get more traffic.

Example:

S1 = Weight 5
S2 = Weight 2

S1 receives more requests.


4. IP Hash

Same user always goes to the same server.

Useful for:

  • Shopping carts
  • User sessions

Types of Load Balancers


Hardware Load Balancer

Physical device.

Examples:

  • F5 BIG-IP

Usually expensive.


Software Load Balancer

Runs as software.

Examples:

  • NGINX
  • HAProxy
  • Traefik

Most companies use these today.


Cloud Load Balancer

Managed by cloud providers.

Examples:

  • AWS Elastic Load Balancer (ELB)
  • Azure Load Balancer
  • Google Cloud Load Balancer

Very popular.


Where Is Load Balancing Used?

Almost everywhere.

Websites

Google
Facebook
Amazon
Netflix

Banking Applications

To handle millions of transactions.


E-commerce Platforms

For handling traffic spikes during sales.


APIs

To distribute API requests.


Load Balancing in Kubernetes

In Kubernetes:

Ingress
Service
Pods

Load balancing happens between Pods.

Example:

Pod 1
Pod 2
Pod 3

Traffic gets distributed among all pods.


Real System Design Architecture

                       Users 
                        ↓          
                  Load Balancer                   
                        ↓     
              ┌────────┬────────┬────────┐ 
              ↓        ↓        ↓ 
          Server1   Server2   Server3     
              ↓        ↓        ↓          
                    Database

This architecture is commonly used in production systems.


Interview Definition

If asked in an interview:

Load Balancing is a technique used to distribute incoming requests across multiple servers to improve performance, scalability, availability, and fault tolerance.


🏁 Final Summary

Load Balancing is like having multiple cashiers in a supermarket instead of one.

Without Load Balancing:

❌ Slow system
❌ Server crashes
❌ Downtime
❌ Poor user experience

With Load Balancing:

✅ Faster response times
✅ Better availability
✅ Easy scalability
✅ Improved reliability


💡 Simple One-Line Definition

A Load Balancer acts like a traffic police officer that intelligently distributes user requests across multiple servers so that no single server becomes overloaded.

How WhatsApp Notifications Work Internally — A System Design Perspective

Have you ever wondered how WhatsApp notifies you instantly when someone sends a message — even when your app is closed? Let’s peel back the layers and look at how WhatsApp’s notification system actually works from a system design point of view.


🧠 The Big Picture

When you receive a new WhatsApp message like

“Ravi: Hey Raju!”

that notification travels through a complex, highly optimized system involving encryption, real-time messaging, and push infrastructure.

Let’s break it down step by step 👇


⚙️ 1. Message Creation — The Journey Begins

When Ravi sends a message to Raju:

  • The WhatsApp client on Ravi’s phone encrypts the message using end-to-end encryption (E2EE).
  • The encrypted payload is sent to WhatsApp’s Message Server through a persistent socket connection.

At this point, the message is unreadable to anyone — even WhatsApp itself.


📡 2. Message Routing — Finding the Recipient

The Message Server receives Ravi’s encrypted message and determines that it needs to reach Raju’s device.

  • If Raju is online, the message is sent immediately via a persistent connection (using XMPP or WebSockets).
  • If Raju is offline, WhatsApp stores the encrypted message temporarily in its Storage Service until Raju reconnects.

📬 3. Triggering a Notification — When the App Is Closed

If Raju’s app is in the background or not connected, WhatsApp triggers a push notification.

Here’s how it works:

  1. The Notification Service in WhatsApp’s backend detects that Raju is offline.
  2. It prepares a lightweight message payload — something like: { "to": "<Raju_Device_Token>", "notification": { "title": "WhatsApp", "body": "Ravi: Hey Raju!" }, "data": { "message_id": "abc123", "chat_id": "ravi_123" } }
  3. It sends this payload to Firebase Cloud Messaging (FCM) for Android or Apple Push Notification Service (APNS) for iPhone.

☁️ 4. Push Infrastructure — Google & Apple Step In

Once WhatsApp sends the notification request:

  • FCM/APNS looks up the device token (a unique ID representing Raju’s phone).
  • They route the message through their global notification delivery networks.
  • Raju’s phone receives it instantly — even if the app isn’t open.

This is possible because FCM and APNS maintain their own persistent channels with your device at the OS level.


🔔 5. Notification Delivery on Device

When Raju’s phone receives the notification:

  • The OS wakes up WhatsApp’s background receiver.
  • A notification appears on the lock screen or notification tray: “Ravi: Hey Raju!”

When Raju taps it:

  • WhatsApp opens and establishes its secure socket connection to the server.
  • The actual encrypted message is fetched and decrypted locally using Raju’s private key.

At this point, the two ticks ✅ (message delivered) appear on Ravi’s chat screen.
When Raju reads it, WhatsApp sends a “read receipt” (blue ticks 💙) back.


🔐 6. End-to-End Encryption (E2EE)

One of WhatsApp’s strongest features is end-to-end encryption.

  • Messages are encrypted on Ravi’s device.
  • Stored in encrypted form on WhatsApp’s server (if needed).
  • Decrypted only on Raju’s device.

Even WhatsApp’s servers can’t see the content — they only know that a message exists and who it’s meant for.


🧩 7. Behind-the-Scenes Components

Here’s what’s happening under the hood:

ComponentRole
Message ServiceHandles message routing between users
Notification ServiceSends push notifications via FCM/APNS
Encryption ServiceEncrypts and decrypts message payloads
Storage ServiceStores encrypted messages for offline users
Delivery TrackerTracks sent, delivered, and read states
User Presence ServiceDetermines whether users are online or offline

🧱 8. System Architecture Overview

Conceptually, it looks like this:

Ravi's Phone ──►(Encrypt Message) ──► Message Server ──► (Route + Store ) ──► Notification Service ──►(Push Notification) ──► FCM/APNS ──► Raju's Phone


⚡ 9. Why This Design Works So Well

GoalHow WhatsApp Achieves It
Real-time deliveryPersistent sockets (XMPP/WebSockets)
Offline supportStored encrypted messages + push
SecurityEnd-to-end encryption keys per user
ScalabilityMicroservices + distributed queues
Low latencyGlobal servers + efficient routing
Battery efficiencyOS-managed push (via FCM/APNS)

🚀 10. Key Takeaways for System Design Interviews

If you’re designing a notification system like WhatsApp’s, remember these principles:

  1. Decouple message delivery and notification logic (use queues).
  2. Use push services (FCM/APNS) for offline or background delivery.
  3. Maintain persistent connections for real-time chat.
  4. Ensure reliability with retry, message persistence, and acknowledgment mechanisms.
  5. Design for privacy with end-to-end encryption and minimal metadata storage.

🏁 Final Thoughts

WhatsApp’s notification system is a perfect blend of real-time communication, security, and scalability.
As a system architect, understanding how these layers interact — from event queues to encryption — is crucial for designing any large-scale, user-facing application.