Apache Spark: A Unified Engine for Big Data Processing

Pre Read Thoughts

At the time of writing (Mid 2025), Databricks, as a business org, is in it’s golden phase because of the obvious reason, “Apache Spark”

I feel Databricks achieved what Cloudera wanted to become, as Spark almost became synonymous with “Data Engineering”

Personally, I’ve used Spark just to get know about it, nothing professional so not too deep

Since, this is not exactly a paper but an article, I just gave it newspaper reading and no notes is taken, find the Spark papers below

Apache Spark Papers

Spark- Cluster Computing with Working Sets

Resilient Distributed Datasets- A Fault-Tolerant Abstraction for In-Memory Cluster Computing Spark