The Chubby lock service for loosely-coupled distributed systems

Introduction

Chubby is a lock service which serves in a loosely coupled distributed system

we can call one chubby instance as one chubby cell

clients can acquire locks on particular resources and sync their activities

primary goal ⇒ reliability, availability

secondary ⇒ throughput, storage capacity

helps with coarse-grained sync and leader election

used Paxos algo for consensus

Design

Rationale

It is easy to maintain a lock service and call it than embodying some code which will do consensus or locking

clients can write/read some small size data (256kb, introduced much later) on chubby, (advertising)

consistent caching over time based caching (like in DNS servers w/ TTL)

Chubby consists of 5 replicas/servers in each cell (at least 3 should be up)

there can be thousands of clients connected to one chubby instance, all these clients should be notified when file change happens

apart from notification, clients or other replicas can call, so to avoid over loading, consistent caching is used client-side

They used coarse-grained locking (longer period of time) instead of fine-grained locking (shorter)

master/leader can be in that position for hours or days as in coarse grained

clients can implement fine grained locking inside their app using coarse grained

System Structure

Chubby server consists for replicas to be FT

followers will have a master lease time (which is renewed periodically if majority is achieved), in which they won’t vote for another master

(raft kinda process happening)

client can know about replicas from DNS server (which will be updated when nodes go down or new node comes up)

image 11.png

Files, directories, and handles

/ls/foo/wombat/pouch is the format how files are stored

ls ⇒ lock service, foo ⇒ chubby cell (dns lookup happens on this), others are usual file paths

chubby cell consists of servers (nodes) (5 in fig 1), which can also be ephemeral, which are like temporary and used to check if client is alive or not

nodes alos have ACL on fs access

The per-node meta-data includes four monotonicallyincreasing 64-bit numbers that allow clients to detect changes easily:

an instance number; greater than the instance number of any previous node with the same name.
a content generation number (files only); this increases when the file’s contents are written.
a lock generation number; this increases when the node’s lock transitions from free to held.
an ACL generation number; this increases when the node’s ACL names are written.

clients can hold read/write locks on files

since it’s hard to keep track of time and order of req.s in distributed systems, sequencers are passed, stored as info

if a lock becomes free because the holder has failed or become inaccessible, the lock server will prevent other clients from claiming the lock for a period called the lock-delay.

Events & API

every file will have some handle like in gfs

handles are create by Open() and destroyed with Close()

Poison() causes outstanding and subsequent operations on the handle to fail without closing it

Caching

Chubby clients cache file data and node meta-data (including file absence) in a consistent, write-through cache held in memory

consistent as in chubby server will send notification to invalidate whenever changes happen

KeepAlive heartbeats are maintained b/w server and clients

The caching protocol is simple: it invalidates cached data on a change, and never updates it. Clients will have to poll to get updated values

In addition to caching data and meta-data, Chubby clients cache open handles.

Sessions and KeepAlives

A client requests a new session on first contacting the master of a Chubby cell. It ends the session explicitly either when it terminates, or if session’s idle

server won’t terminate session explicitly until session lease timeout happens

server extends timeout in 3 instances

on creation of the session
when a master fail-over occurs
when it responds to a KeepAlive RPC from the client

client ensures that there is almost always a KeepAlive call blocked at the master, as master won’t immediately respond to RPC and will block until lease interval is near for client

The master allows a KeepAlive to return early when an event or invalidation is to be delivered

client will maintain local lease time accordingly wrt to master’s lease timeout & RPC req.s in flight time and all

If a client’s local lease timeout expires, it becomes unsure whether the master has terminated its session. The client empties and disables its cache, and we say that its session is in jeopardy.

after this there will be grace period and then finally removes cache (subsequent operations are considered failed)

when grace period begins, chubby library at client app begins jeopardy event

if communication survives, safe event is sent
else, expired event

this prevents when client getting blocked when master went down

Failovers

image 1 7.png

initially client have a lease timeout c1, which got extended to c2 when master responded to (1)

now master’s down, from client side, client completes lease c2 and enters grace period with jeopardy event

say new master comes online, initial RPC will be rejected saying it’s wrong epoch number and client gets replied with a new epoch num

client will send req with new num and in reply gets safe event and a new lease c3 accordingly with new master’s m3 lease

everything back to normal

Database Implementation, Backup, Mirroring

The first version of Chubby used the replicated version of Berkeley DB as its database which is B-trees based implementation

Every few hours, the master of each Chubby cell writes a snapshot of its database to a GFS file server in a different building.

Mechanisms for scaling & the following

from reference pdf

(I should be reading ZK according to schedule but since ZK’s inspiration is chubby, I started this before ZK as I’m going to read chubby afterwards anyway)

Notes

Explorer

Graph View