Notes: https://pdos.csail.mit.edu/6.824/notes/l-raft.txt

Video: https://youtu.be/R2-9bsKmEbo

Prep:


Pattern

So far GFS, MapReduce what’s seen, all have single point of failure

SPF can be coord, master, storage server. These aren’t replicated

Reason is to avoid split brain syndrome

Idea

test-and-set server replication

Test & Set server is server that sets a variable atomically with locking

image 23.png

https://excalidraw.com/#json=G4enV4OrcE5FRMGdF9djS,dkW_orQyFRcM-DGxe5y-DQ

Network Partition

solved by majority rule, op is successful when majority of servers execute successfully

running above setup with 3 servers, if 2 servers exec successfully, op is success

Raft’s building stone is the same, atleast one follower which have all the logs from previous term’s leader should accept vote for new term leader

2f+1 servers are to be replicated to tolerate f faults, as in above 1 fault can be tolerated with 3 servers

majority consideration is taken from all servers, both online and off

Protocols using quorums

around 1990

  • Paxos
  • View-stamped Replcation

there’s no proper use case found until it passed a decade and half

then came Raft in 2014 for more understandability

Replicated State Machine w/ Raft

image 1 15.png

https://excalidraw.com/#json=kQk2I29vLJKUncYMsJTkd,CFrXC2w27zzjZp18y36zdw

log will be replicated on disk

Why Logs

  1. Retransmission previous cmds
  2. Order of cmds
  3. Persistence
  4. Space tentative

logs are identical on all servers

Log Entry

image 2 14.png

https://excalidraw.com/#json=ms1CfmubauhPluHbAWXaK,uuJFrz7pN1eUtbCIK9QKhw

Election

election starts when heartbeats are missing from leader

followers have timer which resets when heartbeats are received

once timeout happens, it’ll transition into candidate by incrementing term number

incase if old leader leader comes back again, it’ll send req.s to what it thinks as followers

followers which are in new term will reject saying it’s no longer in correct term

follower can only vote once a term

split vote problem arises when 2 followers become candidates (asks for vote) each other, this is where randomized tiemouts help

Election timeouts

During election, system is not usable, clients will be blocked

so less elections should happen, for that

  • it should be ≥ few heartbeats

  • gap in random timeouts should be balanced

    • if less, split vote might occur
    • if more, election period will be more
  • short enough that downtime is short

when a leader goes down and come back online, it won’t know what position it was in before (leader/follower) as it’s not persisted in disk.

It’ll start normally as follower until timeout happens, then it’s candidate. It might not even become candidate given there’ll be another leader in new term mostly.

Logs may diverge

image 4 2.png

what if leader (top most) in fig. isn’t present, then last log’s next possibility for idx 4

2 will be rejected (overwritten), 4 will be accepted

7 from (d) is possible when d gets elected, similarly 6 from (c)

I have so many ques………………