top of page

In-Memory vs Disk-Based

  • Speed improvement of up to 100x compared to disk-based Hadoop MapReduce, especially in iteration algorithms
  • Needs huge memory size
 

Accessible & Versatile

  • Hadoop - only Java API's
  • Spark - Java, Python, Scala, R
  • Works on top of Hadoop, or as a standalone
  • Can access and process data from Hadoop as well as other sources
 

Increasingly popular

  • Readily available for existing Hadoop users
  • New favorite among developers
  • Many tools: SQL, Streaming, Machine Learning, GraphX
  • Yahoo deploys Spark for customer behavior data analytics

​

Lightning-Fast

Dev-friendly

Flexibility

Versatile

Fast, general engine for large-scale data processing

 

Open-source framework initiated in

UC Berkeley's AMPLab in 2012

100x faster vs Hadoop MapReduce

Java, Python, Scala, R

Handles various data sources

Data Storage

Users

Cluster computing

 

computers connected together

parallel as one

each node doing the same tasks

more computing power for big data

Lightning-fast cluster computing

© 2015 Muhammad Yusoff

  • Facebook Social Icon
  • Twitter Social Icon
  • Google+ Social Icon
Clients
Our Clients
News Flashes
Services
Support
Follow us on twitter
About Us
Like us on facebook
Join our team
Give us your feedback
bottom of page