Spark work mechanism detailed introduction, spark source code compilation, spark programming combat

Last Update:2018-08-21 Source: Internet

Author: User

Tags shuffle

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Spark Communication Module
1, Spark Cluster Manager can have local, standalone, mesos, yarn and other deployment methods, in order to
Centralized communication mode
1, RPC remote produce call
Spark Communication mechanism:
The advantages and characteristics of Akka are as follows:
1, parallel and distributed: Akka in design with asynchronous communication and distributed architecture
2, Reliability: In the local, remote monitoring and recovery mechanism
3, High-performance: In a stand-alone environment wonderful can send 50 million messages, 1GB memory can create and save 2.5 million actor objects
4, to the center, different from the Master-slave mode, to take the architecture of the hub-free node
5, scalability: Can be in the distributed environment of Scala out, linear expansion of computational capacity.
You can see that Akka has powerful concurrent processing capabilities.

4.5 Fault-tolerant mechanisms
Rdd dependence of 4.5.1 lineage mechanism
A wide reliance, and a narrow reliance
Wide dependent: Lineage according to Partition, to restore, very simple
Narrow dependence: A son of Rdd have more than a father, more trouble, so added Checkpoin, checkpoint mechanism, in fact, is the meaning of backup, to do fault-tolerant processing
You can set the storage path for checkpoint data by Sparkcontext.setcheckpointdir () to store the data back up, and then spark Delete all ancestors rdd dependencies of rdd that have done checkpoints. This operation needs to be done after all the operations that need to be done on this rdd are completed.
Official recommendation: The RDD of the checkpoint is best done in memory that has been cached Rdd, otherwise the RDD will need to be recalculated in the persisted file, resulting in IO overhead.

4.6shuffler mechanism
Shufffler Write
Shuffler Fetch
Shuffler Aggenr

Spark in the implementation of the driver control of the application lifecycle, in the scheduling, Spark used the classic FIFO and fair scheduling algorithms for internal resources to achieve different levels of scheduling. In Spark IO, the data is abstracted to be managed quickly, and a partition in Rdd is a fast one that needs to be handled, and communication in the cluster is very important for the delivery of commands and States, spark through the Akka framework for cluster message communication, spark through Lineage and checkpoint mechanisms for fault-tolerance assurance, lineage to perform the operation, checkpoint redundant data backup, and finally introduced spark shuffle mechanism, spark also borrowed from the MapReduce model, But its shuffle mechanism has been innovated and optimized,
Fifth chapter: Spark development environment Configuration and process

The sixth chapter: Spark Programming combat
1, WordCount

Spark work mechanism detailed introduction, spark source code compilation, spark programming combat

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More