Replication, Sharding & Database Indexing Explained

Hey Readers.

Spotted: a database quietly handling millions of users, billions of requests, and petabytes of data while making it all look effortless.

A user uploads a photo in Mumbai.

Someone likes it in London.

A friend comments from New York.

Another person shares it in Tokyo.

Within seconds, all of those actions are reflected across the world.

To the average user, it feels almost magical. You tap a button, refresh a page, and somehow everything is instantly updated.

But behind that seamless experience lies one of the biggest challenges in software engineering: data doesn't scale as easily as applications do.

When a startup is small, life is simple. One application server. One database. A few thousand users. Everything works beautifully.

Then the product becomes successful.

Thousands of users become millions.

Gigabytes become terabytes.

Requests arrive faster than ever before.

And suddenly the database that once handled everything effortlessly becomes the bottleneck slowing down the entire system.

This is the moment every successful company eventually encounters.

Netflix faced it.

Instagram faced it.

Amazon faced it.

Every large-scale platform eventually reaches a point where a single database server can no longer keep up.

Engineers then face a series of difficult questions.

How do we serve millions of read requests?

How do we handle increasing write traffic?

How do we search through enormous datasets efficiently?

How do we add new servers without moving everything around?

The answers to those questions have shaped some of the most important concepts in system design.

Today we're diving into replication, sharding, indexing, consistent hashing, and the framework engineers use to think about scaling systems.

Because the biggest secret in system design is that technologies rarely exist because engineers wanted something fancy.

They exist because growth broke the old solution.

Replication: Why One Database Is Never Enough

Imagine you're building a social media platform.

Every second, users are opening profiles, scrolling through feeds, reading comments, viewing messages, and searching for content.

Notice something interesting?

Most users are reading data.

Only a small percentage are actually creating new content.

This means that read traffic often massively outweighs write traffic.

For example:

50,000 reads/sec
2,000 writes/sec

Now imagine all 52,000 requests hitting a single database.

It won't stay happy for long.

Eventually response times increase, CPU usage spikes, and the database becomes overloaded.

Engineers quickly realize something important:

Why should one machine handle every read request?

This is where replication enters the picture.

Replication means maintaining multiple copies of the same database across different machines.

Instead of one database handling everything, we create replicas containing identical copies of the data.

A common architecture looks like this:

        Primary
      (Read + Write)
            ↓
    ┌───────┴───────┐
 Replica 1      Replica 2
  (Read)          (Read)

The Primary database handles writes.

Whenever data changes, those changes are propagated to the replicas.

Read requests can then be distributed among the replicas.

Instead of one machine serving thousands of reads per second, the workload is shared across multiple machines.

The benefits are enormous.

First, replication dramatically improves read scalability.

More replicas mean more machines capable of serving requests.

Second, replication improves availability.

If one replica fails, another can continue serving traffic.

Third, replication improves fault tolerance because multiple copies of data exist across different servers.

At first glance, replication sounds perfect.

But distributed systems love introducing tradeoffs.

The biggest challenge is replication lag.

Data isn't teleported from the Primary database to replicas.

It takes time.

Sometimes milliseconds.

Sometimes seconds.

Sometimes longer during heavy load.

Imagine changing your profile picture.

The Primary database updates immediately.

You refresh your profile.

Your request gets routed to a replica.

That replica hasn't received the update yet.

You still see the old profile picture.

Nothing is broken.

The system is simply experiencing replication lag.

This phenomenon is known as a stale read.

The system remains available, but the data isn't perfectly up-to-date.

And that's one of the most important lessons in system design:

Improving one thing often means sacrificing something else.

Replication improves scalability and availability, but it can introduce temporary inconsistencies.

Key Takeaways

Replication creates multiple copies of data.
Reads can be distributed across replicas.
Availability and fault tolerance improve.
Replication lag may cause stale reads.
Replication is primarily used to scale reads.

Sharding: When Replication Stops Helping

Replication solves the read problem.

But eventually another challenge appears.

Write traffic.

Imagine Instagram growing from one million users to one billion users.

Every second users are:

Uploading photos
Posting comments
Following accounts
Sending messages
Updating profiles

All writes still go through the Primary database.

Eventually the Primary becomes overwhelmed.

No matter how many replicas exist, every write still targets the same machine.

This is where sharding becomes necessary.

Sharding means splitting data across multiple databases.

Instead of storing all users in one giant database, data is divided into smaller pieces called shards.

For example:

Users 1-1M   → Shard A
Users 1M-2M  → Shard B
Users 2M-3M  → Shard C

Each shard contains only a subset of the total data.

Think of a library.

Imagine storing every book in a single building.

Eventually that building becomes too crowded.

Instead, books are distributed across multiple branches.

Each branch contains part of the collection.

Together they form the complete library.

That's exactly what sharding does.

The biggest benefit is write scalability.

Instead of one database handling every write, writes are distributed across multiple databases.

Storage scalability improves as well.

Data no longer needs to fit on a single machine.

As the application grows, new shards can be added.

Performance also improves because each database works with a smaller dataset.

However, sharding introduces significant complexity.

Suppose a query requires information stored across multiple shards.

The system must now coordinate between different databases.

This is known as a cross-shard query.

Another challenge is rebalancing.

Imagine adding a new shard.

Data often needs to be moved between existing shards.

That migration process can be expensive and risky.

Then there's the problem of hot shards.

Imagine one shard contains data for celebrity users.

Millions of requests target that shard every day.

Meanwhile other shards remain relatively idle.

One server becomes overloaded while others sit unused.

This imbalance is known as a hot shard.

Sharding solves scalability problems, but it introduces operational challenges that engineers must carefully manage.

Key Takeaways

Replication scales reads.
Sharding scales writes and storage.
Sharding distributes data across multiple databases.
Cross-shard queries and hot shards are common challenges.
Most internet-scale systems eventually use sharding.

Database Indexing: The Difference Between Milliseconds and Minutes

Imagine searching for a word in a 1,000-page textbook.

Without an index, you'd start from page one and keep flipping until you found it.

Now imagine doing that every time you wanted information.

Painful.

Databases face the same challenge.

Suppose a database contains 100 million users.

A user logs in using their email address.

How does the database find that record?

Without an index, the database performs a Full Table Scan.

It checks row after row after row until it finds a match.

Complexity:

O(N)

As the dataset grows, performance deteriorates.

Now imagine the same search using an index.

Instead of examining every row, the database can jump directly to the correct location.

Most relational databases achieve this using data structures called B-Trees.

Searching becomes:

O(log N)

The difference is enormous.

Searching through 100 rows isn't difficult.

Searching through 100 million rows without an index can be painfully slow.

Indexes are commonly created on columns such as:

user_id
email
username
created_at

These columns are frequently used in queries.

However, indexes aren't free.

Every time data changes, indexes must also be updated.

This means more indexes generally improve read performance while slowing down writes.

Engineers must constantly balance these tradeoffs.

Too few indexes make reads slow.

Too many indexes make writes expensive.

The best systems carefully choose which columns truly deserve indexing.

Key Takeaways

Indexes dramatically improve search performance.
Most databases use B-Tree indexes.
Reads become significantly faster.
Indexes consume additional storage.
Excessive indexing can slow writes.

Consistent Hashing: Scaling Without Chaos

Now let's discuss one of the smartest ideas in distributed systems.

Consistent Hashing.

To understand why it exists, we first need to understand the problem it solves.

Imagine distributing users across three servers.

A simple strategy might be:

hash(user_id) % 3

This determines which server stores a particular user's data.

Everything works perfectly.

Until you add another server.

Now the formula becomes:

hash(user_id) % 4

Suddenly almost every user gets assigned to a different server.

Data movement becomes enormous.

Caches become invalid.

Traffic spikes occur.

The entire system becomes unstable.

Engineers hate this.

Consistent Hashing solves the problem.

Instead of placing servers in a list, servers are placed on a logical ring.

Keys are also mapped onto the ring.

When a new server is added, only nearby keys need to move.

Most data remains untouched.

This dramatically reduces redistribution costs.

Imagine moving:

2% of data

instead of:

90% of data

The difference is enormous.

Consistent Hashing is widely used in:

Redis Clusters
Cassandra
Distributed caches
Load balancing systems

Whenever infrastructure needs to grow dynamically, consistent hashing becomes incredibly valuable.

Key Takeaways

Traditional hashing causes large-scale reshuffling.
Consistent Hashing minimizes data movement.
Adding servers becomes easier and safer.
It is widely used in distributed systems.

How Engineers Actually Approach System Design

After learning all these technologies, it's tempting to immediately jump into solutions.

Experienced engineers don't do that.

They start with questions.

Before discussing databases, caches, or load balancers, they first understand the problem.

What should the system do?

How many users will it serve?

How much data will it generate?

What level of reliability is required?

Only after understanding the requirements does architecture begin.

A common thought process looks like this:

1. Clarify Requirements
↓
2. Estimate Scale
↓
3. Design High-Level Architecture
↓
4. Design Database
↓
5. Identify Bottlenecks
↓
6. Add Scaling Components
↓
7. Discuss Tradeoffs

Notice something important.

Technology comes later.

Understanding the problem comes first.

A good system design is not about using the most advanced technology.

It's about choosing the simplest solution that satisfies the requirements.

Final Thoughts

Today's lesson revealed an important truth.

Most database technologies exist because a previous solution stopped working.

Replication exists because reads became overwhelming.

Sharding exists because writes became overwhelming.

Indexes exist because searches became overwhelming.

Consistent Hashing exists because scaling infrastructure became overwhelming.

Every concept in system design is really just a response to growth.

And that's what makes system design so fascinating.

It's not a collection of technologies.

It's a collection of solutions to problems that only appear when millions of users start showing up.

The technologies matter.

But understanding the problem that created them matters even more.

XOXO.

System Design Diaries #5: The Secrets Databases Don't Want You to Know

Replication: Why One Database Is Never Enough

Key Takeaways

Sharding: When Replication Stops Helping

Key Takeaways

Database Indexing: The Difference Between Milliseconds and Minutes

Key Takeaways

Consistent Hashing: Scaling Without Chaos

Key Takeaways

How Engineers Actually Approach System Design

Final Thoughts

Comments

More from this blog

System Design Diaries #4: How the Internet Handles Fame

System Design Diaries #3: The Art of Making Systems Feel Fast

System Design Diaries #2: The Internet's Biggest Network

System Design Diaries #1: Building the Foundations

Command Palette

Replication: Why One Database Is Never Enough

Key Takeaways

Sharding: When Replication Stops Helping

Key Takeaways

Database Indexing: The Difference Between Milliseconds and Minutes

Key Takeaways

Consistent Hashing: Scaling Without Chaos

Key Takeaways

How Engineers Actually Approach System Design

Final Thoughts

Comments

More from this blog