Notes: https://pdos.csail.mit.edu/6.824/notes/l-raft.txt
Video: https://youtu.be/R2-9bsKmEbo
Prep:
-
Raft Paper (up to section 5): https://pdos.csail.mit.edu/6.824/papers/raft-extended.pdf
Pattern
So far GFS, MapReduce what’s seen, all have single point of failure
SPF can be coord, master, storage server. These aren’t replicated
Reason is to avoid split brain syndrome
Idea
test-and-set server replication
Test & Set server is server that sets a variable atomically with locking

https://excalidraw.com/#json=G4enV4OrcE5FRMGdF9djS,dkW_orQyFRcM-DGxe5y-DQ
Network Partition
solved by majority rule, op is successful when majority of servers execute successfully
running above setup with 3 servers, if 2 servers exec successfully, op is success
Raft’s building stone is the same, atleast one follower which have all the logs from previous term’s leader should accept vote for new term leader
2f+1 servers are to be replicated to tolerate f faults, as in above 1 fault can be tolerated with 3 servers
majority consideration is taken from all servers, both online and off
Protocols using quorums
around 1990
- Paxos
- View-stamped Replcation
there’s no proper use case found until it passed a decade and half
then came Raft in 2014 for more understandability
Replicated State Machine w/ Raft

https://excalidraw.com/#json=kQk2I29vLJKUncYMsJTkd,CFrXC2w27zzjZp18y36zdw
log will be replicated on disk
Why Logs
- Retransmission previous cmds
- Order of cmds
- Persistence
- Space tentative
logs are identical on all servers
Log Entry

https://excalidraw.com/#json=ms1CfmubauhPluHbAWXaK,uuJFrz7pN1eUtbCIK9QKhw
Election
election starts when heartbeats are missing from leader
followers have timer which resets when heartbeats are received
once timeout happens, it’ll transition into candidate by incrementing term number
incase if old leader leader comes back again, it’ll send req.s to what it thinks as followers
followers which are in new term will reject saying it’s no longer in correct term
follower can only vote once a term
split vote problem arises when 2 followers become candidates (asks for vote) each other, this is where randomized tiemouts help
Election timeouts
During election, system is not usable, clients will be blocked
so less elections should happen, for that
-
it should be ≥ few heartbeats
-
gap in random timeouts should be balanced
- if less, split vote might occur
- if more, election period will be more
-
short enough that downtime is short
when a leader goes down and come back online, it won’t know what position it was in before (leader/follower) as it’s not persisted in disk.
It’ll start normally as follower until timeout happens, then it’s candidate. It might not even become candidate given there’ll be another leader in new term mostly.
Logs may diverge

what if leader (top most) in fig. isn’t present, then last log’s next possibility for idx 4
2 will be rejected (overwritten), 4 will be accepted
7 from (d) is possible when d gets elected, similarly 6 from (c)
I have so many ques………………