MapReduce의 단점 ? : disk IO → 속도가 느리다

Spark는?

메모리에 올리기 때문에 10~100배정도 빠르다.

In-memory system

Untitled

RDD Resilient Distributed Datasests(RDDs)

Lazy Evaluation

모아서 한번에 처리!

Untitled

Spark Driver

The driver is the process where the main() method of your program runs.

It is the process running the user code that creates a SparkContext, creates RDDs, and performs transformations and actions