Interview Craft

A Calm Approach To The System Design Interview

The Mythic Intel Team · Sep 23, 2025 · 8 min read

The system design interview rewards a calm, repeatable path more than raw brilliance. The interviewer hands you something open-ended, "design a URL shortener," "design a news feed," "design a rate limiter," and watches how you move from a vague prompt to a defensible architecture. Panic makes you jump straight to databases and message queues before you know what you are building. A structured approach keeps you in control: clarify the requirements, size the problem, sketch the API and data model, draw the high-level design, go deep on one component, then talk through bottlenecks and trade-offs. That sequence is what this covers.

Modern system design interviews mostly happen over video with a shared diagramming tool rather than a physical whiteboard, and interviewers now expect you to be proactive about scale rather than waiting to be pushed. They also increasingly ask AI-era variants, like the serving layer behind a large model or a real-time recommendation pipeline, but the underlying method does not change.

Clarify requirements first

Do not start designing until you know what you are designing. Spend the first few minutes turning the one-line prompt into a real spec, and split it into two buckets.

Functional requirements: what the system must do. For a URL shortener: create a short link, redirect to the original, maybe support custom aliases and expiry. Write these down. They define the API later.
Non-functional requirements: how the system must behave. Scale, latency, availability, consistency, durability. Is this read-heavy or write-heavy? How many users? Can a redirect tolerate being a few seconds stale, or must it be exact?

Ask whether you should optimize for read or write traffic, because that single answer reshapes the whole design. A URL shortener is overwhelmingly read-heavy, which pushes you toward aggressive caching. Stating these explicitly also tells the interviewer you scope before you build, which is itself a senior signal.

Estimate the scale

Before drawing boxes, do quick back-of-the-envelope math. You are not chasing precision. You are establishing the order of magnitude so your design decisions have a reason behind them.

Work out rough numbers for: requests per second (split reads and writes), storage growth per year, and bandwidth. For a shortener handling, say, 100 million new links a month, that is roughly 40 writes per second, but if reads outnumber writes 100 to 1 you are serving thousands of reads per second, and that asymmetry is why a cache and read replicas appear in your design. Storage of a few hundred bytes per record over years tells you whether a single database fits or whether you will need to shard. The estimate is the justification for everything that follows.

API and data model

Now translate the functional requirements into a concrete interface. A couple of endpoints is enough: POST /urls to create a short link, GET /{shortCode} to redirect. Name the inputs and outputs. This keeps the conversation grounded in something real instead of hand-waving.

Then the data model. For the shortener, a table keyed on the short code, with the long URL, creation time, and expiry. The access pattern here is a lookup by short code, which means that column must be indexed so the redirect is a fast point query rather than a full scan. Indexing is the difference between a millisecond lookup and a table scan at scale, so call it out. Mention how you generate the short code too: a counter encoded in base62, or a hash, with collision handling if you hash.

High-level design

Sketch the request flow end to end, and place the standard building blocks where they earn their keep.

Load balancer in front of a stateless application tier, so you can scale horizontally by adding servers and distribute traffic across them.
Caching for the hot path. Since reads dominate, a cache in front of the database (an LRU or read-through cache holding the popular short codes) absorbs most redirect traffic and keeps the database from being the bottleneck.
Database for the source of truth, with replication for read scaling and availability: a primary for writes, replicas for reads.
A content delivery network if you serve static assets or want edge proximity, and a message queue if you have asynchronous work like analytics on each click, so the redirect stays fast and the counting happens out of band.

Draw it, narrate the path of a single request through it, and keep it simple before you add anything fancy.

Deep-dive on one component

Interviewers will pick a piece and ask you to go deeper, or you should offer to. This is where you show depth rather than breadth. A common choice is the database layer at scale.

When one database can no longer hold the data or the traffic, you shard: partition the data across multiple machines, for example by a hash of the short code, so each shard owns a slice of the keyspace. Discuss how you route a request to the right shard and what happens when you add shards (consistent hashing limits how much data moves). For the cache, talk eviction policy and what happens on a cache miss. Whatever component you pick, trace its failure modes, not just its happy path.

Bottlenecks and trade-offs

Close by naming where it breaks and what you traded away, because no design is free. This is also where distributed systems theory earns its place.

The CAP theorem, formulated by Eric Brewer, says a distributed data store cannot simultaneously guarantee consistency, availability, and partition tolerance. Since network partitions are a fact of life in any distributed system, partition tolerance is effectively mandatory, which means the real choice during a partition is between consistency (every read sees the latest write) and availability (every request gets a response, possibly stale). For a URL shortener, slightly stale redirects are fine, so you lean toward availability. For a banking ledger, you do not.

PACELC extends this with the part CAP leaves out. It says that during a partition (P) you choose between availability and consistency (A or C), but else (E), when the system is running normally with no partition, you still choose between latency and consistency (L or C). That captures the everyday trade-off: strong consistency usually costs latency, because the system waits to confirm writes are durable across replicas. Spanner is described as PC/EC (it favors consistency in both cases), while a latency-first store like a typical NoSQL system is PA/EL. Naming this trade-off out loud tells the interviewer you understand that "eventually consistent" is a deliberate choice, not an accident.

Wrap up by stating the bottleneck you would watch (likely the database or the cache hit rate), how you would monitor it, and what you would change first under more load.

The hard part of system design is not knowing the components, it is talking through them coherently under time pressure while someone interrupts you. That fluency only comes from saying the path out loud, so practice narrating a full design start to finish against a clock, not just reading reference architectures.

your turn

Stop reading about interviews. Start training for yours.

Build My Room →