System Design Fundamentals: Day 1

Hey Readers.

Spotted: millions of users posting tweets, streaming videos, sending messages, and somehow the internet doesn't completely fall apart.

What's their secret?

No, it's not luck. It's system design.

Today we're pulling back the curtain on the fundamentals every engineer should know before designing systems that can survive real-world traffic.

What Are Interviewers Actually Looking For?

When you're in a system design interview, nobody expects you to reinvent Google.

They're evaluating how you think.

Can you gather requirements? Analyze tradeoffs? Design APIs? Build reliable and scalable systems? Communicate your ideas clearly?

The best candidates don't jump straight into architecture diagrams. They start by asking questions.

Functional Requirements

What should the system do?

For a social media platform:

Users can post tweets.
Users can follow or unfollow others.
Users can view timelines.

Non-Functional Requirements

How should the system behave?

Examples:

99.99% uptime
100 million users
Response times under 200 ms
High availability

Before designing the solution, understand the problem.

A shocking number of engineers skip this step.

The Hierarchy of Speed

Not all storage is created equal.

Some memories are VIP guests. Others are waiting outside the club.

Component	Speed	Size	Persistent?
CPU Registers	Fastest	Tiny	No
L1 Cache	Very Fast	KBs	No
L2 Cache	Fast	MBs	No
L3 Cache	Slower	Larger MBs	No
RAM	Slower	GBs	No
SSD	Slow	TBs	Yes
HDD	Slowest	TBs	Yes

L1 Cache

The closest memory to the CPU.

Located inside each CPU core
Extremely fast
Stores frequently used instructions and data

L2 Cache

A little larger, a little slower.

Dedicated to individual CPU cores
Stores data not found in L1
Reduces trips to slower memory

L3 Cache

The shared gossip hub.

Shared among CPU cores
Helps cores access common data
Faster than RAM but slower than L1 and L2

RAM

The system's working memory.

Stores currently running programs
Much larger than cache
Volatile memory (data disappears when power is lost)

Cache Hits vs Cache Misses

A cache hit means the CPU finds data immediately.

A cache miss means it must travel down the hierarchy looking for it.

And just like searching for a missing group project member five minutes before submission, that takes time.

Speed Order

Registers → L1 Cache → L2 Cache → L3 Cache → RAM → SSD → HDD

This is why caching is such a big deal in large-scale systems.

Frequently accessed data stays closer to the processor, reducing latency and improving performance.

What Does a Real Production System Look Like?

A simplified request flow:

User → DNS → Load Balancer → Application Servers → Cache → Database

Each component has a role:

DNS finds the server.
Load Balancer distributes traffic.
Application Servers handle business logic.
Cache serves frequently requested data quickly.
Database stores persistent information.

Simple on paper.

Wildly complicated at scale.

The Unsung Heroes of Production

Building software is one thing.

Keeping it alive is another.

CI/CD

Continuous Integration and Continuous Deployment automate testing and releases.

Popular tools:

Jenkins
GitHub Actions
GitLab CI

Because manually deploying code at 2 AM is not a personality trait.

Logging

Logs record system events.

Examples:

User logged in
Payment failed
Server crashed

Popular tools:

ELK Stack
Datadog
Splunk

Monitoring

Monitoring tracks system health.

Metrics include:

CPU usage
Memory usage
Error rate
Request latency

Popular tools:

Prometheus
Grafana

Alerting

When something goes wrong, somebody needs to know.

Tools like Slack and PagerDuty notify engineers when critical thresholds are crossed.

Because servers rarely break during business hours.

The Core Pillars of System Design

Scalability

Can the system handle growth?

If 1,000 users become 1,000,000 users tomorrow, does the system survive?

A scalable system grows without requiring a complete redesign.

Reliability

Can users trust the system?

Reliable systems continue working correctly even when things fail.

Availability

Can users access the service when they need it?

Availability is usually measured as:

Availability = Uptime / Total Time

Typical targets:

Availability	Downtime per Year
99%	3.65 days
99.9%	8.76 hours
99.99%	52 minutes
99.999%	~5 minutes

Why not 100%?

Because perfection is expensive.

Sometimes impossibly expensive.

Maintainability

Future engineers should be able to understand and modify the system.

If nobody understands your architecture six months later, you've created a puzzle, not a product.

The CAP Theorem Drama

Every distributed system eventually faces a difficult choice.

CAP Theorem states that a distributed system can only guarantee two of the following three properties:

Consistency (C)

Every node sees the same data.

Availability (A)

Every request receives a response.

Partition Tolerance (P)

The system continues operating despite network failures.

Since network failures are unavoidable, Partition Tolerance is non-negotiable.

That leaves a choice.

CP Systems

Consistency + Partition Tolerance

Availability may suffer.

Examples:

Banking systems
Payment systems

Would you rather see an error message or lose money?

Exactly.

AP Systems

Availability + Partition Tolerance

Strong consistency is sacrificed.

Examples:

Social media platforms
Messaging systems

You might briefly see outdated data, but the system remains available.

Final Thoughts

System design isn't about memorizing diagrams.

It's about understanding tradeoffs.

Every decision improves one thing while sacrificing another.

More consistency may reduce availability.

More throughput may increase latency.

More reliability may increase complexity.

The best engineers aren't the ones with the fanciest architecture.

They're the ones who understand what tradeoffs they're making and why.

And just like every good secret in the Coding World, every large-scale system is hiding a thousand design decisions beneath the surface.

XOXO.

System Design Diaries #1: Building the Foundations

What Are Interviewers Actually Looking For?

Functional Requirements

Non-Functional Requirements

The Hierarchy of Speed

L1 Cache

L2 Cache

L3 Cache

RAM

Cache Hits vs Cache Misses

Speed Order

What Does a Real Production System Look Like?

The Unsung Heroes of Production

CI/CD

Logging

Monitoring

Alerting

The Core Pillars of System Design

Scalability

Reliability

Availability

Maintainability

The CAP Theorem Drama

Consistency (C)

Availability (A)

Partition Tolerance (P)

CP Systems

AP Systems

Final Thoughts

Comments

More from this blog

System Design Diaries #5: The Secrets Databases Don't Want You to Know

System Design Diaries #4: How the Internet Handles Fame

System Design Diaries #3: The Art of Making Systems Feel Fast

System Design Diaries #2: The Internet's Biggest Network

Command Palette

What Are Interviewers Actually Looking For?

Functional Requirements

Non-Functional Requirements

The Hierarchy of Speed

L1 Cache

L2 Cache

L3 Cache

RAM

Cache Hits vs Cache Misses

Speed Order

What Does a Real Production System Look Like?

The Unsung Heroes of Production

CI/CD

Logging

Monitoring

Alerting

The Core Pillars of System Design

Scalability

Reliability

Availability

Maintainability

The CAP Theorem Drama

Consistency (C)

Availability (A)

Partition Tolerance (P)

CP Systems

AP Systems

Final Thoughts

Comments

More from this blog