The 8-Second Cache That Saved Twitter: How a Single Redis Cluster Stopped 143,000 Requests Per Second From Killing the Timeline
In 2012, Twitter's home timeline was dying under its own success. Every refresh triggered 3,000 database queries. Then one engineer proposed something crazy: cache the entire timeline in RAM.
The 8-Second Cache That Saved Twitter: How a Single Redis Cluster Stopped 143,000 Requests Per Second From Killing the Timeline
The Day the Timeline Died
It was 3:47 AM on a Tuesday in July 2012, and Twitter's San Francisco office was on fire. Not literally β but the on-call engineer's Blackberry was vibrating so hard it was walking across his nightstand. The home timeline was down. Again. For the third time that week.
Nick Kallen, a senior infrastructure engineer, pulled up his laptop and stared at the graphs. The numbers were terrifying: 143,000 timeline requests per second. Each request was triggering an average of 3,000 database queries. MySQL was drowning. The fan-out service was choking. And somewhere in Japan, 20 million users were staring at spinning loading wheels because a K-pop star had just tweeted.
Twitter had a problem that looked simple on the surface: show people tweets from accounts they follow. But at scale, that "simple" problem was a distributed systems nightmare. And it was about to get worse β mobile was exploding, and every user was refreshing their timeline obsessively, waiting for the next viral moment.
The solution that saved Twitter wasn't a database rewrite or a fancy new algorithm. It was a controversial bet on Redis β an in-memory key-value store that most engineers in 2012 considered "just a cache." What Twitter's team built became one of the most elegant examples of caching architecture in tech history, and the story of how they got there is a masterclass in distributed systems under pressure.
The Fan-Out Problem Nobody Talks About
To understand why Twitter's timeline was so hard to build, you need to understand the fan-out problem.
When @katyperry (108 million followers) tweets "I love pizza," Twitter doesn't just store that tweet in a database and call it a day. The system needs to fan out that tweet to 108 million timelines. Every single person following Katy Perry needs to see that tweet when they open Twitter.
Twitter had two options:
Option 1: Fan-out on Read (Pull Model)
- When you open Twitter, the system queries: "Who does this person follow?"
- Then: "What are the latest tweets from all those people?"
- Then: Sort, rank, merge, and serve
- Simple to build. Impossible to scale.
The math was brutal. The average Twitter user followed 400 accounts. Each timeline load required:
- One query to fetch your following list (400 IDs)
- 400 queries to fetch recent tweets from each account
- One massive sort-and-merge operation
- Repeat for 143,000 requests per second
That's 57 million database queries per second. MySQL laughed, then crashed.
Option 2: Fan-out on Write (Push Model)
- When @katyperry tweets, immediately push that tweet into 108 million pre-built timelines
- When you open Twitter, just read your pre-built timeline
- Fast reads. Insanely expensive writes.
This is what Twitter initially built. It worked great β until it didn't. Celebrity tweets were taking 45 minutes to fully fan out. Lady Gaga's account was causing cascading failures. And the write amplification was insane: one tweet from Justin Bieber meant 40 million database writes.
Twitter was trapped between two bad options: slow reads or explosive writes. They needed a third way.
The Redis Bet
In early 2012, a small team led by Raffi Krikorian (VP of Engineering) and Mazen Rawashdeh (backend architect) proposed something radical: move the entire home timeline into RAM.
The idea was to use Redis β an open-source in-memory data store built by an Italian developer named Salvatore Sanfilippo β as the primary storage layer for timelines. Not as a cache in front of MySQL. Not as a temporary speedup. As the source of truth.
The pitch was met with skepticism. Redis was fast, but it was also:
- Volatile: Lose power, lose data
- Memory-hungry: RAM is 30x more expensive than disk
- Unproven: No major company was using Redis for primary storage
"You want to put 300 million people's timelines... in RAM?" one exec asked.
"Just the last 800 tweets per person," Mazen replied. "We don't need infinite history. We need 8 seconds β the time between when you open Twitter and when you refresh."
The insight was brilliant. Twitter didn't need to serve your entire Twitter history on every load. They needed to serve recent tweets, instantly. Anything older than 800 tweets could live in MySQL and be fetched on-demand (which almost never happened β how often do you scroll back 800 tweets?).
The architecture they designed was elegant:
The Timeline Cache Architecture
Data Structure: Redis Sorted Sets
Each user got a Redis sorted set:
Key: timeline:user:123456789
Value: Sorted set of tweet IDs, scored by timestamp
When @katyperry tweets at 10:00:00 AM:
- Tweet is stored in MySQL (durability)
- Tweet ID is pushed to Redis sorted sets for all 108M followers (speed)
- Each sorted set keeps only the top 800 tweets (memory efficiency)
When you open Twitter:
ZREVRANGE timeline:user:123456789 0 49(fetch top 50 tweet IDs)- Hydrate tweet content from another Redis cache
- Total latency: 8 milliseconds
The Hybrid Model
But here's where it got clever. Twitter didn't use pure fan-out-on-write for everyone. They built a hybrid:
- Regular users (< 10K followers): Fan-out on write. When you tweet, it's pushed to all your followers' Redis timelines.
- Celebrities (> 10K followers): Fan-out on read. When you load your timeline, the system checks if you follow any celebrities and fetches their tweets on-demand.
Why? Because fan-out on write for Justin Bieber (40M followers) meant 40 million Redis writes per tweet. Even Redis couldn't handle that. But fan-out on read for 200 celebrities was manageable β and those 200 accounts generated 80% of Twitter's traffic.
Sharding Strategy
Twitter didn't use one giant Redis cluster. They used 3,000+ Redis instances, sharded by user ID:
shard_id = user_id % 3000
Each instance stored ~100K users' timelines. This meant:
- Horizontal scaling (add more Redis boxes as users grew)
- Failure isolation (one Redis crash = 0.03% of users affected)
- No hot-spotting (celebrities spread evenly across shards)
The Moment It Clicked
The new architecture launched in October 2012. The results were immediate:
- Timeline load time: 1200ms β 8ms (150x faster)
- Database queries per timeline: 3,000 β 1 (3,000x reduction)
- MySQL CPU usage: 80% β 12%
- Cost per user: Dropped 70% (RAM is expensive, but database queries are worse)
The Redis cluster was handling 800,000 ops/sec with 99.9% of requests under 5ms. The fan-out service was processing 400 million timeline updates per second during peak events.
But the real test came on November 6, 2012: Election Night.
Barack Obama tweeted "Four more years" with a photo. It became the most retweeted tweet in history (at the time). 31 million people saw it in under 90 seconds. The timeline didn't blink.
Nick Kallen watched the graphs from his couch, a beer in hand. No alerts. No crashes. Just smooth, linear scaling.
"We did it," he texted Raffi. "We actually did it."
The Technical Details That Made It Work
1. Memory Optimization: Ziplist Encoding
Redis has a secret weapon for small sorted sets: ziplist encoding. Instead of storing pointers and tree structures (expensive), Redis serializes small sets into a compact byte array. Twitter's 800-tweet cap per user kept most timelines in ziplist mode, saving 60% memory.
2. Replication: Master-Slave Reads
Each Redis master had 2 read replicas. Timeline writes went to the master (strong consistency). Timeline reads came from replicas (eventual consistency, but sub-second lag). This tripled read capacity without tripling write load.
3. Persistence: RDB Snapshots + AOF
Redis wrote snapshots to disk every 5 minutes (RDB) and logged every write command (AOF). If a Redis instance crashed, Twitter could restore the last 5 minutes from disk and replay the AOF. Data loss: < 5 minutes of tweets. Acceptable trade-off for 8ms latency.
4. Eviction Policy: ZREMRANGEBYRANK
When a timeline exceeded 800 tweets, Redis automatically evicted the oldest:
ZREMRANGEBYRANK timeline:user:123 800 -1
This kept memory bounded and predictable. No manual cleanup. No garbage collection pauses.
The Lessons That Lasted
Twitter's Redis architecture became a blueprint for modern social platforms. Instagram, Pinterest, and Snapchat all adopted variations. The core insights:
1. Not All Data Needs Durability
Timelines are ephemeral. If you lose 5 minutes of tweets, users don't notice β they just refresh. Compare that to payments (must never lose) or messages (should never lose). Match your storage layer to your durability requirements.
2. Memory Is Cheaper Than You Think
In 2012, Twitter's Redis cluster cost $2M/year in RAM. Sounds expensive β until you realize the MySQL cluster it replaced cost $8M/year in hardware, licensing, and ops time. RAM scales linearly. Complex database queries scale exponentially.
3. Hybrid Models Beat Pure Models
Pure fan-out-on-write: too slow for celebrities. Pure fan-out-on-read: too slow for everyone else. The hybrid approach (write for regular users, read for celebrities) gave Twitter the best of both worlds. Real systems need practical trade-offs, not textbook purity.
4. The 80/20 Rule at Massive Scale
Twitter's 200 most-followed accounts generated 80% of timeline traffic. Optimize for the 80%, special-case the 20%. That's how you serve billions of requests without a billion servers.
The Legacy
Today, Twitter (now X) still uses Redis for timeline caching, though the architecture has evolved. The cluster now handles 1.2 billion ops/sec and stores timelines for 400 million active users. Redis has become the de facto standard for real-time caching at scale β Uber uses it for geospatial indexing, GitHub for rate limiting, and Slack for presence status.
But the real legacy isn't the technology. It's the mindset.
In 2012, most engineers treated caches as a band-aid β a temporary speedup to hide database problems. Twitter's team proved that caching can be architecture. That sometimes the fastest, cheapest, most reliable solution is to just... put it all in RAM.
As Mazen later wrote in a blog post: "We didn't cache the timeline. We rethought the timeline."
That Redis cluster β those 3,000 instances humming in data centers around the world β doesn't just serve tweets. It serves a lesson: when you're stuck between impossible options, sometimes the answer is to redefine the problem.
And if you can do it in 8 milliseconds, even better.
Keep Reading
200 Milliseconds to Magic: How Spotify's 2000 Microservices Turn a Tap Into Sound β And Why $0.003 Per Stream Shaped Every Line of Code
You press play. 200 milliseconds later, audio floods your headphones. What just happened? The story of how a Swedish 'pirate' built the architecture that delivers 100 million songs to 600 million users β and why paying artists fractions of a penny forced the most elegant infrastructure decisions in tech.
900 Million Users, 50 Engineers, $0 on Ads: The Erlang-Powered Architecture That Made WhatsApp the Most Efficient Tech Company Ever Built
When Facebook paid $19 billion for WhatsApp in 2014, Mark Zuckerberg wasn't just buying a messaging app β he was buying the most radically efficient engineering organization in Silicon Valley history. Here's how they did it.
The Cursor That Shouldn't Work: How Google Sheets Lets Two People Type in the Same Cell Without Losing a Single Keystroke
You're editing cell B4. Your colleague is editing cell B4. You both hit 'enter' at the exact same millisecond. Neither of you loses a character. How is that even possible?