The fold,foldbykey,treeaggregate of the basic RDD operator for Spark programming, Treereduce

Last Update:2018-07-25 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

The fold,foldbykey,treeaggregate of the basic RDD operator for Spark programming, Treereduce1) Fold

def fold (zerovalue:t) (OP: (T, T) + t): T

This API operator receives an initial value, the fold operator passes in a function, merges two values of the same type, and returns a value of the same type

This operator merges the values in each partition. Each partition is merged with a zerovalue as the initial value at each time each partition is merged.

Val A = Sc.parallelize (List, 3)
a.fold (0) (_ + _)//This is somewhat similar to the reduce function. It just adds an initial value.
Res59:int = 6

2) Foldbykey

def foldbykey (Zerovalue:v) (func: (V, v) = v): rdd[(K, v)]
def foldbykey (Zerovalue:v, Numpartitions:int) (Func: ( V, v) = v): rdd[(k, v)]
def foldbykey (Zerovalue:v, Partitioner:partitioner) (func: (V, v) = v): rdd[(k, V)]

First of all, its api,foldbykey receives an initial value, the Zerovalue type, and the type of the value of the last key value pair that is returned is also consistent with the type of the initial value. Compared to Redeucebykey, Foldbykey added an initial value.
Can look at a few examples

Val A = Sc.parallelize (List ("Dog", "Cat", "owl", "GNU", "Ant"), 2) Val B = a.map (x = (x.length, x))//b is of type (INT, Strin g) B.foldbykey ("") (_ + _). Collect//This representation is aggregated according to key.
Since the length is all 3, the result of aggregation is shown below. res84:array[(Int, String)] = Array ((3,dogcatowlgnuant) Val a = Sc.parallelize (List ("Dog", "Tiger", "Lion", "cat", "Panth Er "," Eagle "), 2) Val B = a.map (x = = (x.length, x)) B.foldbykey (" ") (_ + _). Collect res85:array[(Int, String)] = Array ( (4,lion), (3,dogcat), (7,panther), (5,tigereagle)) scala> Val rdd= sc.makerdd (Array ("a", 0), ("A", 2), ("B", 1), ("B", 2 ), ("C", 1)), 2) rdd:org.apache.spark.rdd.rdd[(String, Int)] = parallelcollectionrdd[1] at Makerdd at <console>:27 S Cala> Rdd.foldbykey (_+_). Collect.foreach (println) (b,103) (a,102) (c,101)//actually Foldbykey internal call is Combinebykey. Zerovalue is actually similar to Createcombiner, and Mergevalue and Mergecombiner are the same (which is the function we passed in), all of which are first performed within the partition, and then merge the results of the merge in the partition again.

3) Treeaggregate

First take a look at the API for this treeaggregate operator:

def Treeaggregate[u] (zerovalue:u) (Seqop: (U, T) ⇒u, Combop: (U, u) ⇒u, depth:int = 2) (implicit arg0:classtag[u]): U

The result returned by this operator is the U type. First, an initial value is passed in, and again, the first function is operated on the partition first, Seqop. Merges the type of T encountered within the partition into the U type, and finally merges the results of the merged U types of the different partitions. The first function is within a partition, and the second function is in the interval.

Treeaggregate is similar to aggregate, except that it is aggregated in the form of a multi-layered tree. Another is that this initial value does not work for the second function, just in the first function. The default depth is 2.

Val z = sc.parallelize (List (1,2,3,4,5,6), 2)

def myfunc (Index:int, iter:iterator[(INT)]): iterator[string] = {
  Iter.toList.map (x = "[PartID:" +  Index + ", val:" + x + "]"). Iterator
}

z.mappartitionswithindex (MyFunc) . collect
Res28:array[string] = Array ([partid:0, Val:1], [partid:0, Val:2], [partid:0, Val:3], [Partid:1, Val:4], [Partid:1, Val:5], [Partid:1, Val:6])

z.treeaggregate (0) (Math.max (_, _), _ + _)
Res40:int = 9//The same, first in each partition to find the maximum values, and then merge between partitions. The initial value is not used for the second function.
//If the initial value is transformed to 5, then first Partition max (5,1,2,3) =5
//If the initial value is transformed to 5, then the 2nd partition max (5,4,5,6) =6
///The final result is 5 + 6 = 11, no initial value introduced
z.treeaggregate (5) (Math.max (_, _), _ + _)
Res42:int = 11

4) Treereduce

def  Treereduce (f: (T, T) ⇒t, Depth:int = 2): T

The treereduce is somewhat similar to the reduce function and does not need to pass in the initial value, except that the operator uses a multi-layered tree for the reduce operation.

Val z = sc.parallelize (List (1,2,3,4,5,6), 2)
z.treereduce (_+_)
res49:int = 21

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

The fold,foldbykey,treeaggregate of the basic RDD operator for Spark programming, Treereduce

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

The fold,foldbykey,treeaggregate of the basic RDD operator for Spark programming, Treereduce

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support