Scaling Memcache at Facebook

Pre-Read Thoughts

all the time I thought there’s just memcached, a simple in-mem kv store, but the title says memcache

so went on to check and got this SO question which is saying that memcached and memcache are two separate php lib.s (”php” connection makes sense as facebook is written in php

memcached just like etcd, does d ⇒ distributed?

(paper explains this part in the very beginning)

Overview

users consume more than create ⇒ read heavy

Memcached provides ops get, set, delete

analogy in paper is that “memcached” is considered as lib or running binary and ”memcache” to describe distributed system (paper’s title finally makes sense, along with the Scaling Memcache at Facebook)

distributed memcached instances ⇒ memcache

(so d means daemon, and not distributed)

Query Cache: pretty standard way of checking cache before hitting db as in fig

Generic Cache: normal KV store (used for app.s like ML computation vals etc)

image 17.png

image 1 13.png

serving stale data for a short period is allowed

architecture

frontend cluster ⇒ front facing web servers which connect to their own local memcache servers (server running memcached software)

storage cluster is replicated to slave region (only reads are allowed from slaves)

In a Cluster: Latency and Load

cache is distributed in a consistent hashing setup

parallel req.s and batching

data dependency in form of DAG, to get parallelization of related data

client-server communication

udp (less overhead, fast) for gets and tcp for set, updates

these req.s are coalesced/combined at mcrouter

to prevent (incast) congestion, req.s go through sliding window

Leases: for stale sets, thundering herd (related outage in past (2010))

lease is token given to a client during cache read, to put data back in cache incase of cache miss

memcache validates this token, so that if any other req deleted that same key, it won’t proceed

thundering herd is prevented by controlling the token issue rate, so req.s will have to wait

in this gap, results can become stale but that’s fine

Memcache pools

pooling high churn (freq accessed) cache and low churn separately

Handling failures

secondary machines pool called glutter , is maintained to mitigate network partitions accounting to small outages

when a req don’t get any resp, it’ll go to glutter pool

In a Region: Replication

frontend clusters with web + memcache and storage as seen in architecture, define a region

Invalidation

Initially a web server which modifies data sends invalidation req and then there’s mcsqueal, a daemon, which inspects commit log of db and sends invalidation req.s by batching to mcrouter

mcrouter will unpack the batch and delete

image 2 12.png

multiple frontend servers share same memcache servers, this is a regional pool

challenge is deciding whether a key needs to be replicated across all frontend clusters or have a single replica per region

when cluster is newly added, it’ll be cold with no cache to server ⇒ more cache misses ⇒ db calls

Cold Cluster Warmup, when cold cluster misses, it’ll go to warm cluster than db

Across Regions

as seen in architecture diagram, master region will have rw access to db, slave regions will have ro

mcsqueal daemons solve race condition, than web servers sending invalidation req.s, as daemons handle only after any txn’s committed and not in between

remote markers are used at slave regions to mark if cache is stale and should be redirected to master

Single Server Improvements

Performance optimizations

diff slab classes are maintained for diff size of cache ⇒ better memory management as no fragmentation

cache is evicted in LRU policy within a slab class

Memcached lazily evicts entries by checking expiration times when serving a get request or at the end of the LRU

short-lived items are places into a circular buffer of linked lists (indexed by seconds until expiration) called the Transient Item Cache, while the others are evicted lazily

Memcache Workload

from reference pdf

Post-Read Thoughts

more of a networking related paper in the beginning than I expected (hence presented at NSDI’13 obviously)

using udp for optimized “gets” seemed interesting as afaik and seen this is not common I think

ik that cache invalidation is one of the two hard problems in CS but caching itself is hard at scale of fb