Spark RDD Aggregatebykey

Source: Internet
Author: User
Tags spark rdd

Aggregatebykey This rdd is a bit cumbersome, and tidy up the use examples for reference

Directly on the code

ImportOrg.apache.spark.rdd.RDDImportOrg.apache.spark. {sparkcontext, sparkconf}/*** Created by Edward on 2016/10/27. */Object Aggregatebykey {def main (args:array[string]) {val sparkconf:sparkconf=NewSparkconf (). Setappname ("Aggregatebykey"). Setmaster ("Local") Val Sc:sparkcontext=NewSparkcontext (sparkconf) val Data= List ((1, 3), (1, 2), (1, 4), (2, 3)) var rdd= Sc.parallelize (data,2)//data split into two partitions//The values that are merged in different partition, and the data type of a, B is zerovalue data typedef comb (a:string, b:string): String ={println ("Comb:" + A + "\ T" +b) A+B}//values that are merged in the same partition, A has a data type of Zerovalue, and B has the data type of the original valuedef seq (a:string, b:int): String ={println ("SEQ:" + A + "\ T" +b) A+B} rdd.foreach (println)    
//Zerovalue A neutral value that defines the type of return value and participates in the Operation//Seqop used to combine values in a single partition.//comb used to combine values in different partition.Val aggregatebykeyrdd:rdd[(Int, String)] = Rdd.aggregatebykey ("100") (Seq,comb)//Print OutputAggregatebykeyrdd.foreach (println) Sc.stop ()}}

Output result Description:

/* split the data into two partitions//partitions one data (1,3)//partition two data (1,4) (2,3)//partition one of the same key data to merge seq:100     3   //(1,3) Start and neutral values to merge merge  results to 1003seq:1003     2   //(+) merge results for 10032//partition two identical key data to merge seq:100     4  //(1,4) Start and neutral values to merge 1004seq:100     3  //(2,3) Start and neutral values merge 1003 merge the results of two partitions//key to 2, only in one partition, do not need to merge (2,1003) (2,1003)//key 1, exist in two partitions, and data types are consistent , merge comb:10032     * /

Reference code and the following instructions to understand

Description of official website

Aggregatebykey (zerovalue) (seqop, combop, [numtasks]) When called in a dataset of (K, V) pairs, returns a dataset of (K, U) pairs where the values for each key is aggregated U Sing the given combine functions and a neutral "zero" value. Allows an aggregated value type, which is different than the input value type, while avoiding unnecessary allocations. Like groupByKey in, the number of the reduce tasks are configurable through an optional second argument.

Description of functions in source code

/**
* Aggregate the values of each key, using given combine functions and a neutral "zero value".
* This function can return a different result type, U, than the type of the values in this RDD,
V. Thus, we need one operation for merging a V to a U and one operation for merging both U ' s,
* As in Scala. Traversableonce. The former operation is used for merging values within a
* partition, and the latter is used for merging values between partitions. To avoid memory
* Allocation, both of these functions is allowed to modify and return their first argument
* Instead of creating a new U.
*/

Spark RDD Aggregatebykey

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.