Personal repository of ==insights, annotations from research, academic, and white papers==, that I have read, as well as those I plan to explore in future
Papers are primarily focused on ==Distributed Systems==, Database Systems, ==Operating Systems== — which are analogous to each other
-
an interesting observation?
Turing Awardees from 2013 to 2016 worked on similar domains I’m interested in (that too, the same order!)
Lamport in Distributed Systems
Known for Paxos Consensus, Lamport Timestamps
Stonebraker in Database Systems
Known for Ingres, PostgreSQL
Diffie & Hellman in Cryptography
Known for Key Exchange Protocol
(whereas RSA authors won in 2002, the year I was born!)
Berners-Lee in Web Development
Known for World Wide Web

-
[[#How I Read [WIP] ]]
Papershelf
This is the papershelf where anyone can find all the papers that I’ve read so far with notes.
Best thing? anyone can search, sort or filter by any property. I use
search for titles
sort for year (to get an idea on timeline of papers)
filter for Authors or Org.s
square brackets [ ] in titles represent that it’s not part of the original title (explicitly added for easy search)
Below view is restricted to load first 10 records to keep it tidy. To see complete paper-shelf in full page, you can goto gowthamkalla.com/papershelf or (if on desktop) click on current view header i.e.,
Shelf/Kanban/Timelineand then “Open as full page”
Papershelf
To Be Read
(I only prioritize and queue around 5 papers as To Be Read in shelf, and keep remaining as below)
Distributed Systems
(Inspired from 6.824 Distributed Systems)
Bluesky and the AT Protocol- Usable Decentralized Social Media
Swarm- Cost-Efficient Video Content Distribution with a Peer-to-Peer System
Apache Flink- Stream and Batch Processing in a Single Engine
Naiad- A Timely Dataflow System
Samza- Stateful Scalable Stream Processing at LinkedIn
Dryad- Distributed Data-Parallel Programs from Sequential Building Blocks
The Hadoop Distributed File System HDFS
Boki- Stateful Serverless Computing with Shared Logs
Grove- a Separation-Logic Library for Verifying Distributed Systems
Chardonnay- Fast and General Datacenter Transactions for On-Disk Databases
Chord- A Scalable Peer-to-peer Lookup Service for Internet Applications
Mesos- A Platform for Fine-Grained Resource Sharing in the Data Center
Large-scale cluster management at Google with Borg
MillWheel- Fault-Tolerant Stream Processing at Internet Scale
No compromises- distributed transactions with consistency, availability, and performance FaRM
Ethereum- A Next-Generation Smart Contract and Decentralized Application Platform
Tango- Distributed Data Structures over a Shared Log
Chain Replication for Supporting High Throughput and Availability
Photon- Fault-tolerant and Scalable Joining of Continuous Data Streams
Paxos Made Live - An Engineering Perspective
CORFU- A Shared Log Design for Flash Clusters
Wormhole- Reliable Pub-Sub to Support Geo-replicated Internet Services
A simple totally ordered broadcast protocol ZAB Zookeeper Atomic Broadcast
Academic
(Inspired from Lamport’s Publications)
Time, Clocks, and the Ordering of Events in a Distributed System
The Byzantine Generals Problem
Practical Byzantine Fault Tolerance pBFT
Impossibility of Distributed Consensus with One Faulty Process FLP
Viewstamped Replication- A New Primary Copy Method to Support Highly-Available Distributed Systems
Conflict-free Replicated Data Types CRDTs
Zab- High-performance broadcast for primary-backup systems Zookeeper Atomic Broadcast
Database Systems
(Inspired from 15-721 Advanced Database Systems)
What Goes Around Comes Around… And Around…
The Snowflake Elastic Data Warehouse
Building An Elastic Query Engine on Disaggregated Storage Snowflake
Photon- A Fast Query Engine for Lakehouse Systems
Lakehouse- A New Generation of Open Platforms that Unify Data Warehousing and Advanced Analytics
Dremel- Interactive Analysis of Web-Scale Datasets BigQuery
Citus- Distributed PostgreSQL for Data-Intensive Applications
CockroachDB- The Resilient Geo-Distributed SQL Database
ClickHouse - Lightning Fast Analytics for Everyone
DuckDB- an Embeddable Analytical Database
MotherDuck- DuckDB in the cloud and in the client
TiDB- A Raft-based HTAP Database
FoundationDB- A Distributed Unbundled Transactional Key Value Store
F1- A Distributed SQL Database That Scales
Mesa- Geo-Replicated, Near Real-Time, Scalable Data Warehousing
Megastore- Providing Scalable, Highly Available Storage for Interactive Services
Large-scale Incremental Processing Using Distributed Transactions and Notifications Percolator
Yellowbrick- An Elastic Data Warehouse on Kubernetes
Aerospike- Architecture of a Real-Time Operational DBMS
Magma- A High Data Density Storage Engine Used in Couchbase
Book
Architecture of a Database System
Machine Learning
TensorFlow- A System for Large-Scale Machine Learning
TensorFlow- Large-Scale Machine Learning on Heterogeneous Distributed Systems
My Reading Setup
Once I have access to PDF file, I upload it to 2 folders in my gdrive
- “Untouched” folder, for raw untouched pdfs (just in case)
- “Papers” folder — which is public — for highlighted pdfs
With Adobe Acrobat on web, which is free and connected to my Gdrive, I highlight and underline Gdrive documents from Acrobat & write notes here in Notion
I did try Zotero for a while, instead of Acrobat, not pleased by annotation/comments’ styling. Some other time with good config!
How I Read [WIP]
https://github.com/papers-we-love/papers-we-love?tab=readme-ov-file#how-to-read-a-paper
http://ccr.sigcomm.org/online/files/p83-keshavA.pdf
I find papers mostly from the below section and once I’ve chosen paper, I’ll Papers up by priority in TBR section
Before I actually start reading a paper, I will have a glance to check sections and their sub headings
With this, I can guesstimate no. of pages I’m going to concentrate on and time to be invested as well
Papers will contain ending sections starting with benchmarks or perf numbers, from here, it’ll be a relaxed read where I just highlight points
I just highlight or annotate with just one color, whereas people in academia generally use 3-4 colors signifying different levels
All my annotated PDFs are publicly available in my drive collectively, and notes can be opened from papershelf that’s present on this very page
Like in the above , I open 3 windows in parallel, first for acrobat, second for notion + excalidraw, and third for general google search and GPTs from OpenAI or someother
When I start reading, I start convo with chatgpt that I’m reading so and so paper, and I will ask questions on that
Sometimes paragraphs can be tough to grasp, the brain will read it super smooth but won’t be braining. Since these papers (and their related) been in literature for a good time, you can argue with LLMs, it’s actually one of the best use-cases. So I just simply ask it till I get satisfied with answer (it’s good with answers, so far, but do validate the responses)
Inception Of Sources
Distributed Systems
- dancres.github.io/pages U awesome-distributed-systems U Papers We Love [Begins: Perfect Start]
- macintux/6227368 [TDK: Peak]
Database Systems
Mixed
-
USENIX(OSDI, NSDI) U Arxiv U Google Scholar [TDK Rises: Likely uninteresting for most]