Action action by Spark common functions

Source: Internet
Author: User
Tags foreach
Summary:

RDD: Elastic distributed DataSet, is a special set of ' support multiple sources ' have fault tolerant mechanism ' can be cached ' support parallel operation, an RDD represents a dataset in a partition
There are two operators of Rdd: Transformation (conversion):transformation is a deferred calculation, when an RDD is converted to another RDD without immediate conversion, just remember the logical operation of the dataset
Ation (execution): triggers the operation of the spark job, which actually triggers the calculation of the conversion operator

This series focuses on the function operations commonly used in spark:
1.RDD Basic Conversion
2. Key-value RDD conversion
3.Action Operating Chapter
the function of this hair 1.reduce
2.collect 3.count 4.first 5.take 6.top 7.takeOrdered 8.Countbykey 9.collectAsMap 10.lookup 11.aggregate 12.fold 13.saveAsFile 14.saveAsSequenceFile1.reduce (func): Through function func before aggregating data from partitions, Func receives two parameters, returns a new value, and the new value continues to be passed as argument to function func until the last element 2.collect (): Returns all elements in the dataset as data to the driver program, in order to prevent driver program memory overflow, generally to control the size of the returned dataset
  3.count (): Returns the number of data set elements
  4.first (): Returns the first element of a dataset   5.take (N): Returns the first n elements on a dataset as an array   6.top (N): Returns the first n elements by default or by a specified collation, which is output by default in descending order   7.takeOrdered (n,[ordering]): Returns the first n elements by natural order or by a specified collation Example 1:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 def main (args:array[string]) {     val conf = new sparkconf (). Setmaster ("local"). Setappname ("Red UCE ")      val sc = new Sparkcontext (conf)      val rdd = sc.parallelize (1 to 10 , 2)      val Reducerdd = Rdd.reduce (_ + _)      val reduceRDD1 = Rdd.reduce (_- _)//If partition data is 1 results for -53      val countrdd = Rdd.count ()      val Firstrdd = rdd.firs T ()      val takerdd = Rdd.take (5)    //Output previous element      val Toprdd = Rdd.top (3)      //from high to the bottom output first three elements      val Takeorderedrdd = rdd.takeordered (3)    //in natural order from bottom to high output first three elements        println ("Func +:" +reducerdd)   &N bsp;  println ("func-:" +reducerdd1)      println ("Count:" +countrdd)    &nbs P println ("First:" +FIRSTRDD)      println ("Take:")      takerdd.foreach (x = print (x + ""))  &NB sp;   println ("\ntop:")      toprdd.foreach (x = print (x + ""))    &NB Sp println ("\ntakeordered:")      takeorderedrdd.foreach (x = print (x + ""))    &N Bsp Sc.stop   }
Output:
Func +: The
func-: 15//If the partition data is 1 The result is -53
count:10
first:1 take
:
1 2 3 4 5
Top:
9 8
Tak Eordered:
1 2 3
(Rdd dependency graph: The red block represents an rdd area, the black block represents the partition collection, the same as the same) (Rdd dependency graph)
8.countByKey (): on an RDD of the k-v type, count the number of each key, return (number of k,k) 9.collectAsMap (): on the k-v type of RDD, The function differs from collect in that the COLLECTASMAP function does not contain duplicate keys, for duplicate keys. Subsequent elements overwrite the preceding element 10.lookup (k) : On an rdd that acts on the K-v type, returns all of the V-value examples of the specified K 2:
1 2 3 4 5 6 7 8 9 Ten page UP def main (args:array[string]) {    val conf = new sparkconf (). Setmaster ("local"). Setappname ("Kvfunc" )     val sc = new Sparkcontext (conf)     val arr = List (("A", 1), ("B", 2), ("A", 2 ), ("B", 3))     val rdd = Sc.parallelize (arr, 2)     val Countbykeyrdd = Rdd.countbyke Y ()     val collectasmaprdd = Rdd.collectasmap ()       println ("Countbykey:")      Countbykeyrdd.foreach (print)       println ("\ncollectasmap:")     Collectasmaprdd.foreach (print)     sc.stop  }
output:
 Countbykey: (b,2) (a,2) Collectasmap: (a,2) (b,3) 
          (Rdd dependency graph)  
11.aggregate (Zerovalue:u) (Seqop: (u,t) = U,comop (u,u) + U): The seqop function aggregates the data for each partition into a value of type U, and the COMOP function aggregates the U-type data of each partition to get a value of type U
1 2 3 4 5 6 7 8 def main (Args:array[string]) {val conf = new sparkconf (). Setmaster ("local"). Setappname ("Fold") val sc = NE W sparkcontext (conf) val Rdd = Sc.parallelize (List (1, 2, 3, 4), 2) Val Aggregaterdd = Rdd.aggregate (2) (_ +_,_ * _) println (Aggregaterdd) Sc.stop}
Output:
90
Step 1: Partition 1:zerovalue+1+2=5 partition 2:zerovalue+3+4=9 step 2:zerovalue* result of partition 1 * result of partition 2 =90 (Rdd dependency graph)
12.fold (Zerovalue:t) (OP: (t,t) = T): The uses the OP function to aggregate the elements in each partition and merge the elements of each partition, and the OP function requires two parameters, at the beginning the first passed parameter is zerovalue,t as the data type of the RDD dataset, The function is equivalent to the SEQOP and COMOP functions are the same aggregate function Example 3

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.