Home | apache-spark-v2

top of page

This site was designed with the

website builder. Create your website today.Start Now

In-Memory vs Disk-Based

Speed improvement of up to 100x compared to disk-based Hadoop MapReduce, especially in iteration algorithms
Needs huge memory size

Accessible & Versatile

Hadoop - only Java API's
Spark - Java, Python, Scala, R
Works on top of Hadoop, or as a standalone
Can access and process data from Hadoop as well as other sources

Increasingly popular

Readily available for existing Hadoop users
New favorite among developers
Many tools: SQL, Streaming, Machine Learning, GraphX
Yahoo deploys Spark for customer behavior data analytics

Lightning-Fast

Dev-friendly

Flexibility

Versatile

Fast, general engine for large-scale data processing

Open-source framework initiated in

UC Berkeley's AMPLab in 2012

100x faster vs Hadoop MapReduce

Java, Python, Scala, R

Handles various data sources

Data Storage

Users

Cluster computing

computers connected together

parallel as one

each node doing the same tasks

more computing power for big data

Lightning-fast cluster computing

bottom of page