Notes: https://pdos.csail.mit.edu/6.824/notes/l-spanner.txt
Video: https://youtu.be/ZulDvY429B8
Prep:
About
wide area txs, tx happening across diff machines, DCs
rw txs → 2PC + 2PL + Paxos groups
ro txs → snapshot isolation, synchronized clocks
High Level Organization
Diff DCs where shards are replicated and maintained with individual Paxos instances under a grouping
replication helps with parallellism and close replica used for less latency
with Paxos ⇒ quorum, FT and slowness is mitigated

https://excalidraw.com/#json=GR7cmZ0yQsFbXA26BdQOT,eyV922jcSDUM3RPcPvE1E
Challenges
- local reads should give latest write
- Txs across shards
- txs must be serializable
R/W Txs

https://excalidraw.com/#json=E_4r6FgGr4K90Msqrmk6w,QR2bnnICbv9RG4UI3_KXRg
R/O Txs
common, mostly used ⇒ fast
reads from local replica, no lock, no 2pc
challenge → correctness / consistency
Correctness
should be serializable
they also support external consistency ⇒ if t1 happens before t2, t2 should see t1’s changes
Bad Plan
to read latest commit
T3 will observe x from one point of time, and y from another

https://excalidraw.com/#json=Hfop5zL2NcQgyWsWouocP,adh36WHa7EvS8vPayqZZGQ
Good Plan: Snapshot Isolation
assign ts to tx, spanner will have multiple versions with timestamps (like tablet in bigtable)
ts for rw, commit time; for ro, start time
when Rx runs, it’ll try for value with ts < 15 i.e., 10

https://excalidraw.com/#json=MM5Z09jeMcDOSYbrU1hHB,MSxCZsx1ScEaTmXqecBTxA
Replica didn’t see Wx @10 (stale/not exists)
solved with safe time
paxos sends writes in ts order
before rx @ 15, replica need to wait for write ts > 15, then it’s sure that no other txs are made in b/w
(also wait for txs that are prepared and not committed)
Clocks
matters only for ro txs
ts is longer ⇒ waiting is longer
ts is smaller ⇒ t3 @ 9 (wrong time sent by clock), then it’ll break linearizability
Clock Sync
tough → clock drifting → atomic clock
sync clock over some time period with GPS
this error — epsilon — can be in range of few to ms (micro to milli seconds)
solution is not to use true ts
TrueTime uses some interval in which actual time might fall in as ts
⇒ [earliest, latest]
when starting tx or committing tx, .latest is used, which guarantees that real time happened
when waiting for commit, it’s delayed until ts < .earliest , which guarantees that real time didn’t happen yet

T2 will wait to commit until it’s earliest is > T1’s latest
T3 will have latest as ts, so it’ll definitely read latest than real time
https://excalidraw.com/#json=9DrMOXUYe_FCRAW_Qfhv_,8ZTTEDpskLSMjPRmGezp-w