International - English

Cart Console

Topic Center

Contact Sales

Home > Others

Spark kernel secret -10-rdd source analysis

Last Update:2015-01-21 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

The core approach to RDD:

First look at the source code of the GetPartitions method:

GetPartitions returns a collection of partitions, which is an array of type partition

We just want to get into the HADOOPRDD implementation:

1, getjobconf (): Used to obtain the job configuration, get configured with clone and non-clone mode, but the clone mode is not Thread-safe, default is forbidden, non-clone mode can be obtained from the cache, Create a new one if not in the cache, and then put it in the cache

2. Enter Getinputformcat (jobconf) method:

3. Enter Inputformat.getsplits (jobconf, minpartitions) method:

Enter the Getsplits method of the Fileinputformcat class:

5. Enter Hadooppartition:

The getdependencies expression is a dependency between the Rdd, as follows:

Getdependencies returns a SEQ collection of dependencies in which the underscore in the dependency array is of type placeholder

We enter the Getdependencies method in the Shuffledrdd class:

We enter the Shuffledependency class:

Each RDD will have a computed function, as follows:

We enter the compute method of Hadoopmappartitionswithsplitrdd:

The compute method is calculated for each partition of the RDD, and the source code for the Taskcontext parameter is as follows:

Getpreferredlocations is the preferred location for finding partition:

We enter Newhadooprdd's getpreferredlocations:

In fact, the RDD also has an optional partitioning strategy:

The source code of Partitioner is as follows:

It can be seen that the default is to use Hashpartitioner, note that key is an array case;

Spark.default.parallelism must be set up, otherwise the RDD will be transmitted according to partitions data, which will also be prone to oom

Spark kernel secret -10-rdd source analysis

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

Related Keywords:

redis source code analysis sqlite source code analysis open source text analysis software what is kernel version 3 10 49 sentiment analysis project source code in java top 10 data analysis tools stop code unexpected kernel mode trap windows 10

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

What's Trending

Top 10 Tags

datastax versions naming convention zookeeper client class definition md5 microsoft sql server 2005 data structures exception handling error handling

Top 10 Keywords

microsoft download center down wordpress address url site address url wordpress address url windows installer 4 0 download 302 not found web address url definition site address url wordpress db2 integer mac os installation step by step pdf abbreviation for return

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Spark kernel secret -10-rdd source analysis

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support