Notes: https://pdos.csail.mit.edu/6.824/notes/l-zookeeper.txt

Video: https://youtu.be/HYTDDLo2vSE

Prep:


Widely-used system

high perf

async

no strong consistnecy

coordination servers

Replicated Sate Machine

image 25.png

https://excalidraw.com/#json=Wplfj28TgFAdWqlX4_D8N,w0yYEyjqR9zSRF7gn8quzg

ZAB (ZK Atomic Broadcast) is like Raft

image 1 17.png

https://excalidraw.com/#json=tODZrJXsJk1ZNFxWyk6U6,Kcfa-tyvRrqlbpskPaIE-w

leader receives operation put from client, async-ly sends to followers

once quorum is met, operation is successful

how many puts per sec?

1 network roundtrip, Leader to Followers

~1 ms (across same region DataCenters)

2 writes, one at followers (overlapping time considering async), one at leader

~2ms for 1 write to SSD

~4ms for 2 writes

total ~5ms for one op ⇒ 200 puts per sec

ZK’s Throughput

21 ops/s with 0% of reads ⇒ all are writes

  • async + batched

    • 1 write for entire batch result
  • reads can be processed by any server (not only leader)

This violates linearizability

image 2 15.png

Stale & Past Reads

Clients can read stale values if they connected to stale follower

at the same time, if a client is connected to updated follower can read up to date value, and in next session, if connected to a stale follower, it’ll read past values

Linearizability

distributed system behaves as a single machine

  • total order of operations
  • order matches real-time
  • read op returns value of latest write

ZK provides

  • linearizable writes but not reads
  • FIFO client order
    • writes in client order
    • observe last write from same client when reading
    • prefix of log when other clients

client req.s will have zxid which is like log index

zxid helps determine if data is up to date if not server will wait until it’s committed till there