top of page

In-Memory vs Disk-Based
-
Speed improvement of up to 100x compared to disk-based Hadoop MapReduce, especially in iteration algorithms
-
Needs huge memory size
Accessible & Versatile
-
Hadoop - only Java API's
-
Spark - Java, Python, Scala, R
-
Works on top of Hadoop, or as a standalone
-
Can access and process data from Hadoop as well as other sources
Increasingly popular
-
Readily available for existing Hadoop users
-
New favorite among developers
-
Many tools: SQL, Streaming, Machine Learning, GraphX
-
Yahoo deploys Spark for customer behavior data analytics
​
Lightning-Fast
Dev-friendly
Flexibility
Versatile
Fast, general engine for large-scale data processing
Open-source framework initiated in
UC Berkeley's AMPLab in 2012




100x faster vs Hadoop MapReduce
Java, Python, Scala, R




Handles various data sources
Data Storage

Users





Cluster computing
computers connected together
parallel as one
each node doing the same tasks
more computing power for big data

Lightning-fast cluster computing
bottom of page