Skip to main content

Command Palette

Search for a command to run...

System Design Evolution: Building a URL Shortener from MVP to Planet-Scale

Updated
โ€ข8 min read
System Design Evolution: Building a URL Shortener from MVP to Planet-Scale
A

Developer from India.

โ€œEvery simple system starts as a toy and evolves into infrastructure. The art lies in knowing when to evolve.โ€


๐Ÿงญ Why This Article Exists

Most system design discussions about URL shorteners stop at โ€œput a DB behind an API.โ€
This piece goes all the way โ€” from single-binary MVP to Meta-level, multi-region infrastructure โ€” showing how and why each evolution happens.

Itโ€™s meant to be both:

  • A reference document for system design interviews

  • A guide for engineers who actually want to build a scalable URL shortener in Go (TDD + production-grade)


๐Ÿ“‘ Table of Contents

  1. Stage 1 โ€“ MVP (Proof of Concept)

  2. Stage 2 โ€“ Single-Instance Production

  3. Stage 3 โ€“ Database Persistence

  4. Stage 4 โ€“ Caching Layer

  5. Stage 5 โ€“ Distributed System

  6. Stage 6 โ€“ Enterprise Scale

  7. Architectural Decisions

  8. SLOs and Failure Design

  9. Interview Framework: How to Use This

  10. Key Takeaways

  11. Further Reading


๐Ÿงฉ Stage 1 โ€” MVP (Proof of Concept)

AspectValue
DeploymentSingle binary
StorageGo map + RWMutex
Scalability~1K requests/sec
Cost<$50/mo
ReliabilityData lost on restart โŒ

Example (Go)

type URLShortener struct {
  mu   sync.RWMutex
  data map[string]string
}

โœ… Fast, tiny, deployable in minutes
โŒ No persistence, single point of failure

When to stop here:
Hackathon projects, demos, coding challenges.


โš™๏ธ Stage 2 โ€” Single-Instance Production

New Concepts

  • Graceful shutdown

  • Environment-based config (BASE_URL, PORT)

  • URL canonicalization

  • Collision retry loop (insert-if-absent)

โœ… Production-grade hygiene
โŒ Still no durability or scale


๐Ÿ—„๏ธ Stage 3 โ€” Database Persistence

Architecture

Schema

CREATE TABLE urls (
  key CHAR(8) PRIMARY KEY,
  long_url TEXT NOT NULL,
  created_at TIMESTAMP DEFAULT NOW(),
  click_count INT DEFAULT 0
);
CREATE INDEX idx_long_url ON urls(long_url);

Why PostgreSQL?

  • Easy to reason about consistency

  • Native unique constraints for idempotency

  • Works well with Go pgxpool

  • Trivial to scale reads via replicas

MetricValue
Latency20โ€“50 ms per read
Throughput~1K req/s
Data durabilityโœ…
Horizontal scaling๐Ÿšง Needs more

โœ… Durable and correct
โŒ All reads hit DB
โŒ Writes limited by one primary


โšก Stage 4 โ€” Add a Caching Layer

Architecture

Read path

Write path

Cache Policy

  • LRU: only top 1โ€“5 % URLs

  • TTL: 24 hours

  • Hit rate: 80โ€“90 % (Zipfian workloads)

MetricValue
Avg latency~5 ms
Peak throughput100K req/s
Cache memoryfew GB
DB load reduction~90 %

โœ… Orders-of-magnitude faster
โœ… DB load drops drastically
โŒ Need cache invalidation strategy
โŒ Cache consistency issues possible


๐ŸŒ Stage 5 โ€” Distributed System

Sharded, Multi-Region Design

Key generation

  • 8-char Base62 key = 62โธ โ‰ˆ 2.18 ร— 10ยนโด combinations

  • First 2 chars = shard prefix (3,844 shards)

  • Predictable, uniform distribution

Replication Model

RoleResponsibility
PrimaryWrites
ReplicasReads
Sync replicationStrong consistency
Async cross-regionDR + read latency

Performance

  • ~500K req/s reads

  • p99 latency โ‰ˆ 20 ms

  • Availability โ‰ˆ 99.99 %

โœ… Scale by adding shards
โœ… Regional isolation
โŒ Operational overhead (failover, backups)


๐Ÿข Stage 6 โ€” Enterprise / Meta-Scale

Global Topology

Supporting Infrastructure

  • Redis Cluster: 300 shards (multi-region replication)

  • Postgres Shards: 3,844 per region ร— 3 replicas

  • Eventing: Kafka for click streams

  • Observability: Prometheus, Grafana, ELK, Jaeger

  • Disaster Recovery: RPO < 10 s, RTO < 1 min

Numbers (illustrative)

MetricValue
Read throughput5 M req/s
Write throughput100 K req/s
p99 latency20 ms
Cache hit rate85 %
Availability99.999 %
Annual volume150 T requests/year

โœ… Planet-scale durability
โœ… Real-time analytics
โœ… Regional failover
โŒ Huge operational complexity


๐Ÿงฎ Architectural Decisions

DecisionChoiceReasoning
Key length8 chars~218 T unique keys, negligible collision probability
ShardingPrefix (first 2 chars)Uniform distribution, easy routing
Cache strategyHot 1โ€“5 %, LRU + TTL 24hHigh hit-rate, bounded memory
MetricsIn-mem counters + periodic DB flush500 ns updates, durability via batch
Storage APIInterface-based (Store interface)Enables swap: memory โ†” Redis โ†” Postgres
SecurityValidate URLs, block private IPs, rate-limit writes, sign aliasesPrevent SSRF, spam, and abuse
ObservabilityRED metrics, structured logs, tracingEnable SLO monitoring & debugging

๐Ÿ“ˆ SLOs and Failure Design

TierTargetDegradation Strategy
Availability99.99 %Serve from cache โ†’ read replicas โ†’ failover region
Latencyp50 โ‰ค 10 ms, p99 โ‰ค 50 msPrefer cached reads; fallback replicas
DurabilityRPO โ‰ค 10 s, RTO โ‰ค 1 minPromote replica, replay Kafka logs
ScalabilityLinear โ‰ˆ 1M req/s per regionHorizontal app + cache scaling

๐Ÿง  Interview Framework: How to Use This

When asked to โ€œDesign a URL shortenerโ€ in interviews, use this 5-minute outline:

StepWhat to SayKey Points
1. RequirementsShorten, redirect, delete, analyticsFunctional + non-functional (scale, latency)
2. MVPOne Go server + in-memory mapStart small
3. PersistenceAdd Postgres for durabilityRead/write paths
4. Scale readsAdd Redis read-through cacheExplain cache invalidation
5. Scale writesShard DB by key prefix62ยฒ โ‰ˆ 3.8K shards
6. Geo-scaleMulti-region replicas, CDN, rate limitingFault tolerance
7. ObservabilityMetrics, tracing, loggingSLOs & debugging
8. Trade-offsLatency vs Durability vs CostArticulate why

๐Ÿ’ก Hint: Keep numbers handy โ€” 8-char key โ‰ˆ 2 ร— 10ยนโด, Redis hit ~ 1 ms, DB read ~ 50 ms.


๐Ÿ” Key Takeaways

  1. Scale evolves. Each layer solves a previous bottleneck.

  2. Most traffic is read-heavy. Optimize 80 % cached reads first.

  3. Simplicity beats premature scale. Donโ€™t shard until you must.

  4. Measure relentlessly. Latency histograms, p99, error budgets.

  5. Design for failure. Everything breaks; plan for graceful degradation.


๐Ÿ“š Further Reading


๐Ÿ Closing Thoughts

Building a URL shortener seems trivial โ€” until you have to do it for billions of users.

Start with a single process.
Measure.
Then add layers only when you must.

Thatโ€™s not just how you scale systems โ€” itโ€™s how you scale engineering judgment.


๐Ÿ Code

All source code, tests, and diagrams are available here โ†’ https://github.com/AkshayContributes/url-shortener


Written by Akshay Thakur (2025)
Staff-level System Design Reference | Go + Distributed Systems + Scale Engineering