Apache Spark: A Unified Engine for Big Data Processing
Pre Read Thoughts
At the time of writing (Mid 2025), Databricks, as a business org, is in it’s golden phase because of the obvious reason, “Apache Spark”
I feel Databricks achieved what Cloudera wanted to become, as Spark almost became synonymous with “Data Engineering”
Personally, I’ve used Spark just to get know about it, nothing professional so not too deep
Since, this is not exactly a paper but an article, I just gave it newspaper reading and no notes is taken, find the Spark papers below
Apache Spark Papers
Spark- Cluster Computing with Working Sets
Resilient Distributed Datasets- A Fault-Tolerant Abstraction for In-Memory Cluster Computing Spark