A brief explanation of Spark's learning notes

Source: Internet
Author: User
Tags apache mesos

Overview:

Spark is an open-source cluster computing system based on memory computing, which is designed to make data analysis faster.

Spark is very small, developed by a smaller team at the AMP Lab at the University of California, Berkeley. Language in use

The code for the core part of the project is Scala, with only 63 scala files. (AMP lab name is a bit of a point:

Algorithm machine people, algorithms, machines, people)

Spark is an open-source cluster computing environment similar to Hadoop, but there are some differences between the two

, these useful differences make spark more advantageous in some workloads, in other words

said that spark enabled the memory distribution dataset, which, in addition to being able to provide interactive queries, could also optimize the iteration

Workloads.

Spark is implemented in the Scala language and uses Scala as its application framework. Unlike Hadoop,

Spark and Scala are tightly integrated, and Scala can operate as easily as a local collection object

Distributed data sets.

Spark also introduces a rich Rdd (elastic distributed data Set). An RDD is a group of nodes that are distributed only

A collection of Read objects. These collections are resilient and can be rebuilt if part of the data set is lost.

Reconstruction Section The process of a dataset relies on a fault-tolerant mechanism that can maintain "descent" (that is, allowing a number-based

rebuilding part of the data set according to the derivative process information). The RDD is represented as a Scala object and can be

Create it in the widget;


Summarize:
1.Spark is a development library
2. Any library that can run successfully can be part of spark
3. Universal, it can and Spark Sql,spark streaming,mllib (Machine leaning), GRAPHX seamless integration
It is a platform and is a common development library
4. Ideas from various industries and experts can be assembled into spark to become a powerful API

Spark Benefits:

1. First spark is a memory-based calculation

2. Provides a distributed parallel computing framework that supports DAG graphs, reducing intermediate result io overhead between multiple computations

3. Provide the cache mechanism to support multiple iterations or data sharing to reduce IO overhead

4.RDD maintains a bloodline relationship, once the RDD has been hung, can be automatically rebuilt through the parent RDD to ensure fault tolerance

5. Mobile computing rather than mobile data, the RDD partition can read the data blocks in the Distributed file system to the

Nodes in memory for calculation

6. Use a multi-thread pool model to reduce task startup overhead

Avoid unnecessary sort operations in the 7.shuffle process

8. Use fault-tolerant, highly scalable akka as a communication framework


To run the framework:

1.Hadoop of MapReduce frame platform yarn

2.Apache Mesos Frame Platform

3.Spark Standalone Framework Platform

4. Amazon's AWS Platform


Also, as with Hadoop2.7.0, the community decided from Spark1.5 will no longer support JDK1.6
JDK1.7 's References:
http://liujunjie51072.blog.163.com/blog/static/868916212009915105633843/

A brief explanation of Spark's learning notes

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.