Apache Spark (English Wikipedia)

Analysis of information sources in references of the Wikipedia article "Apache Spark" in English language version.

refsWebsite

Global rank English rank

18apache.org

3,206^th place

2,477^th place

5web.archive.org

1^st place

3berkeley.edu

580^th place

462^nd place

3databricks.com

low place

3github.com

383^rd place

320^th place

3slideshare.net

2,161^st place

2,535^th place

2usenix.org

5,990^th place

3,752^nd place

2doi.org

2^nd place

2semanticscholar.org

11^th place

8^th place

1safaribooksonline.com

low place

1janbasktraining.com

low place

1arxiv.org

69^th place

59^th place

1harvard.edu

18^th place

17^th place

1gigaom.com

3,829^th place

2,507^th place

1mapr.com

low place

1pluralsight.com

low place

1cloudera.com

low place

1sigmoid.com

low place

1techtarget.com

2,435^th place

1,699^th place

1books.google.com

3^rd place

1microsoft.com

153^rd place

151^st place

1computerweekly.com

8,313^th place

5,298^th place

1openhub.net

8,551^st place

low place

147deg.com

low place

47deg.com

"Using Scala 3 with Spark". 47 Degrees. Retrieved 29 July 2022.

apache.org

spark.apache.org

"Spark Release 2.0.0". MLlib in R: SparkR now offers MLlib APIs [..] Python: PySpark now offers many more MLlib algorithms"
"Spark 2.2.0 Quick Start". apache.org. 2017-07-11. Retrieved 2017-10-19. we highly recommend you to switch to use Dataset, which has better performance than RDD
"Spark 2.2.0 deprecation list". apache.org. 2017-07-11. Retrieved 2017-10-10.
"Cluster Mode Overview - Spark 2.4.0 Documentation - Cluster Manager Types". apache.org. Apache Foundation. 2019-07-09. Retrieved 2019-07-09.
"Spark Release 1.3.0 | Apache Spark".
"MLlib | Apache Spark". spark.apache.org. Retrieved 2016-01-18.
"Spark 2.4.8 released". spark.apache.org. Archived from the original on 2021-08-25.
"Spark 3.0.3 released". spark.apache.org.
"Spark 3.1.3 released". spark.apache.org. Archived from the original on 2022-06-18.
"Spark 3.2.4 released". spark.apache.org.
"Spark 3.3.3 released". spark.apache.org.
"Spark 3.4.3 released". spark.apache.org.
"Spark 3.5.2 released". spark.apache.org.
"Versioning policy". spark.apache.org.

blogs.apache.org

"The Apache Software Foundation Announces Apache&#8482 Spark&#8482 as a Top-Level Project". apache.org. Apache Software Foundation. 27 February 2014. Retrieved 4 March 2014.
"The Apache Software Foundation Announces Apache&#8482 Spark&#8482 as a Top-Level Project". apache.org. Apache Software Foundation. 27 February 2014. Retrieved 4 March 2014.

mail-archives.apache.org

Doan, DuyHai (2014-09-10). "Re: cassandra + spark / pyspark". Cassandra User (Mailing list). Retrieved 2014-11-21.

projects.apache.org

"Apache Committee Information".

arxiv.org

Xin, Reynold; Rosen, Josh; Zaharia, Matei; Franklin, Michael; Shenker, Scott; Stoica, Ion (June 2013). Shark: SQL and Rich Analytics at Scale (PDF). SIGMOD 2013. arXiv:1211.6176. Bibcode:2012arXiv1211.6176X.

berkeley.edu

amplab.cs.berkeley.edu

Zaharia, Matei; Chowdhury, Mosharaf; Franklin, Michael J.; Shenker, Scott; Stoica, Ion. Spark: Cluster Computing with Working Sets (PDF). USENIX Workshop on Hot Topics in Cloud Computing (HotCloud).
Xin, Reynold; Rosen, Josh; Zaharia, Matei; Franklin, Michael; Shenker, Scott; Stoica, Ion (June 2013). Shark: SQL and Rich Analytics at Scale (PDF). SIGMOD 2013. arXiv:1211.6176. Bibcode:2012arXiv1211.6176X.
Figure showing Spark in relation to other open-source Software projects including Hadoop

books.google.com

Malak, Michael (1 July 2016). Spark GraphX in Action. Manning. p. 89. ISBN 9781617292521. Pregel and its little sibling aggregateMessages() are the cornerstones of graph processing in GraphX. ... algorithms that require more flexibility for the terminating condition have to be implemented using aggregateMessages()

cloudera.com

blog.cloudera.com

Shapira, Gwen (29 August 2014). "Building Lambda Architecture with Spark Streaming". cloudera.com. Cloudera. Archived from the original on 14 June 2016. Retrieved 17 June 2016. re-use the same aggregates we wrote for our batch application on a real-time data stream

computerweekly.com

Clark, Lindsay. "Apache Spark speeds up big data decision-making". ComputerWeekly.com. Retrieved 2018-05-16.

databricks.com

Damji, Jules (2016-07-14). "A Tale of Three Apache Spark APIs: RDDs, DataFrames, and Datasets: When to use them and why". databricks.com. Retrieved 2017-10-19.
Zaharia, Matei (2016-07-28). "Structured Streaming In Apache Spark: A new high-level API for streaming". databricks.com. Retrieved 2017-10-19.
Spark officially sets a new record in large-scale sorting

doi.org

Wang, Yandong; Goldstone, Robin; Yu, Weikuan; Wang, Teng (May 2014). "Characterization and Optimization of Memory-Resident MapReduce on HPC Systems". 2014 IEEE 28th International Parallel and Distributed Processing Symposium. IEEE. pp. 799–808. doi:10.1109/IPDPS.2014.87. ISBN 978-1-4799-3800-1. S2CID 11157612.
Chintapalli, Sanket; Dagit, Derek; Evans, Bobby; Farivar, Reza; Graves, Thomas; Holderbaugh, Mark; Liu, Zhuo; Nusbaum, Kyle; Patil, Kishorkumar; Peng, Boyang Jerry; Poulosky, Paul (May 2016). "Benchmarking Streaming Computation Engines: Storm, Flink and Spark Streaming". 2016 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW). IEEE. pp. 1789–1792. doi:10.1109/IPDPSW.2016.138. ISBN 978-1-5090-3682-0. S2CID 2180634.

gigaom.com

Harris, Derrick (28 June 2014). "4 reasons why Spark could jolt Hadoop into hyperdrive". Gigaom. Archived from the original on 24 October 2017. Retrieved 25 February 2016.

github.com

dotnet/spark, .NET Platform, 2020-09-14, retrieved 2020-09-14
"GitHub - DFDX/Spark.jl: Julia binding for Apache Spark". GitHub. 2019-05-24.
"Spark.jl". GitHub. 14 October 2021.

harvard.edu

ui.adsabs.harvard.edu

Xin, Reynold; Rosen, Josh; Zaharia, Matei; Franklin, Michael; Shenker, Scott; Stoica, Ion (June 2013). Shark: SQL and Rich Analytics at Scale (PDF). SIGMOD 2013. arXiv:1211.6176. Bibcode:2012arXiv1211.6176X.

janbasktraining.com

"What is Apache Spark? Spark Tutorial Guide for Beginner". janbasktraining.com. 2018-04-13. Retrieved 2018-04-13.

mapr.com

doc.mapr.com

MapR ecosystem support matrix

microsoft.com

dotnet.microsoft.com

".NET for Apache Spark | Big data analytics". 15 October 2019.

openhub.net

Open HUB Spark development activity

pluralsight.com

"Applying the Lambda Architecture with Spark, Kafka, and Cassandra | Pluralsight". www.pluralsight.com. Retrieved 2016-11-20.

safaribooksonline.com

techbus.safaribooksonline.com

Chambers, Bill (2017-08-10). "12". Spark: The Definitive Guide. O'Reilly Media. virtually all Spark code you run, where DataFrames or Datasets, compiles down to an RDD^{[permanent dead link]}

semanticscholar.org

api.semanticscholar.org

Wang, Yandong; Goldstone, Robin; Yu, Weikuan; Wang, Teng (May 2014). "Characterization and Optimization of Memory-Resident MapReduce on HPC Systems". 2014 IEEE 28th International Parallel and Distributed Processing Symposium. IEEE. pp. 799–808. doi:10.1109/IPDPS.2014.87. ISBN 978-1-4799-3800-1. S2CID 11157612.
Chintapalli, Sanket; Dagit, Derek; Evans, Bobby; Farivar, Reza; Graves, Thomas; Holderbaugh, Mark; Liu, Zhuo; Nusbaum, Kyle; Patil, Kishorkumar; Peng, Boyang Jerry; Poulosky, Paul (May 2016). "Benchmarking Streaming Computation Engines: Storm, Flink and Spark Streaming". 2016 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW). IEEE. pp. 1789–1792. doi:10.1109/IPDPSW.2016.138. ISBN 978-1-5090-3682-0. S2CID 2180634.

sigmoid.com

Kharbanda, Arush (17 March 2015). "Getting Data into Spark Streaming". sigmoid.com. Sigmoid (Sunnyvale, California IT product company). Archived from the original on 15 August 2016. Retrieved 7 July 2016.

slideshare.net

Sparks, Evan; Talwalkar, Ameet (2013-08-06). "Spark Meetup: MLbase, Distributed Machine Learning with Spark". slideshare.net. Spark User Meetup, San Francisco, California. Retrieved 10 February 2014.
Malak, Michael (14 June 2016). "Finding Graph Isomorphisms In GraphX And GraphFrames: Graph Processing vs. Graph Database". slideshare.net. sparksummit.org. Retrieved 11 July 2016.
Malak, Michael (14 June 2016). "Finding Graph Isomorphisms In GraphX And GraphFrames: Graph Processing vs. Graph Database". slideshare.net. sparksummit.org. Retrieved 11 July 2016.

techtarget.com

"On-Premises vs. Cloud Data Warehouses: Pros and Cons". SearchDataManagement. Retrieved 2022-10-16.

usenix.org

Zaharia, Matei; Chowdhury, Mosharaf; Das, Tathagata; Dave, Ankur; Ma, Justin; McCauley, Murphy; J., Michael; Shenker, Scott; Stoica, Ion (2010). Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing (PDF). USENIX Symp. Networked Systems Design and Implementation.
Gonzalez, Joseph; Xin, Reynold; Dave, Ankur; Crankshaw, Daniel; Franklin, Michael; Stoica, Ion (Oct 2014). GraphX: Graph Processing in a Distributed Dataflow Framework (PDF). OSDI 2014.

web.archive.org

Harris, Derrick (28 June 2014). "4 reasons why Spark could jolt Hadoop into hyperdrive". Gigaom. Archived from the original on 24 October 2017. Retrieved 25 February 2016.
Shapira, Gwen (29 August 2014). "Building Lambda Architecture with Spark Streaming". cloudera.com. Cloudera. Archived from the original on 14 June 2016. Retrieved 17 June 2016. re-use the same aggregates we wrote for our batch application on a real-time data stream
Kharbanda, Arush (17 March 2015). "Getting Data into Spark Streaming". sigmoid.com. Sigmoid (Sunnyvale, California IT product company). Archived from the original on 15 August 2016. Retrieved 7 July 2016.
"Spark 2.4.8 released". spark.apache.org. Archived from the original on 2021-08-25.
"Spark 3.1.3 released". spark.apache.org. Archived from the original on 2022-06-18.