International - English

Cart Console

Topic Center

Contact Sales

Home > Developer > Mobile Develop

Spark Learning Notes 1:application,driver,job,task,stage Understanding

Last Update:2016-07-24 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

I read Spark's original paper and related materials and learned about some of the frequently used terminology in spark and recorded it.

1,application

Application (application) is actually a program submitted to spark with Spark-submit. Let's say the sparkpi of the calculated pi in spark examples. A application usually consists of three parts: taking data from a data source (such as HDFs) to form an RDD, calculated through the transformation and action of the RDD, Output the results to the console or external storage (let's say collect collects output to console).

2,driver

The driver in spark actually feels similar to the functionality of application master in yarn. The main task is to complete the scheduling and coordination with executor and cluster manager. There are client and cluster modes. The client mode driver runs on the machine on which the task is submitted, while the cluster mode randomly selects a machine in the machine to start the driver. A map from the Spark website provides an overview of driver's functionality.

3,job

Job in Spark is not the same as the job in Mr. The job in Mr is primarily a map or reduce job. And Spark's job is really very different, an action operator even if a job, say Count,first, etc.

4, Task

The task is the latest execution unit in spark. The RDD is typically with partitions, and each partition execution on a executor can be a task.

5, Stage

The stage concept is unique in spark. Typically, a job will switch to a certain number of stages. Each stage is executed in order. As for how the stage is segmented, it is preferred to know the concept of narrow dependency (narrow dependency) and wide dependency (wide dependency) mentioned in the spark paper. In fact, it is very good to distinguish, look at the data in the parent Rdd to enter a different sub-rdd, if only into a sub-rdd is narrow dependence, otherwise is wide dependence. The boundary of wide dependence and narrow dependence is the dividing point of the stage. Two diagrams from Spark's paper can clearly understand the narrow dependence and the division of the stage.

As to why this is divided, it is mainly due to the difference in fault-tolerant recovery and processing performance (wide dependence requires shuffer).

There's so much to know about Spark's terminology for the time being that it may not be in place, but that's all.

Spark Learning Notes 1:application,driver,job,task,stage Understanding

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

Related Keywords:

PDF expansion Pack--laravel-dompdf and laravel-snappy used in... 07-12

[Android Security] Dex File Format Analysis 04-23

Bwapp----Mail Header injection (SMTP) 07-12

Android TextView: "Do not concatenate text displayed with Set... 05-24

Server based PHP codeigniter,android based on volley for mult... 11-20

Several ways iOS intercepts redirect requests (302) 07-13

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

What's Trending

Top 10 Tags

datastax versions naming convention zookeeper client class definition md5 microsoft sql server 2005 data structures exception handling error handling

Top 10 Keywords

microsoft download center down wordpress address url site address url wordpress address url windows installer 4 0 download 302 not found web address url definition site address url wordpress db2 integer mac os installation step by step pdf abbreviation for return

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Spark Learning Notes 1:application,driver,job,task,stage Understanding

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support