Topic Center

Contact Sales

Home > Others

Action action by Spark common functions

Last Update:2018-07-26 Source: Internet

Author: User

Tags foreach

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Summary:

RDD: Elastic distributed DataSet, is a special set of ' support multiple sources ' have fault tolerant mechanism ' can be cached ' support parallel operation, an RDD represents a dataset in a partition
There are two operators of Rdd: Transformation (conversion):transformation is a deferred calculation, when an RDD is converted to another RDD without immediate conversion, just remember the logical operation of the dataset
Ation (execution): triggers the operation of the spark job, which actually triggers the calculation of the conversion operator

This series focuses on the function operations commonly used in spark:
1.RDD Basic Conversion
2. Key-value RDD conversion
3.Action Operating Chapter the function of this hair 1.reduce
2.collect 3.count 4.first 5.take 6.top 7.takeOrdered 8.Countbykey 9.collectAsMap 10.lookup 11.aggregate 12.fold 13.saveAsFile 14.saveAsSequenceFile1.reduce (func): Through function func before aggregating data from partitions, Func receives two parameters, returns a new value, and the new value continues to be passed as argument to function func until the last element 2.collect (): Returns all elements in the dataset as data to the driver program, in order to prevent driver program memory overflow, generally to control the size of the returned dataset
3.count (): Returns the number of data set elements
4.first (): Returns the first element of a dataset 5.take (N): Returns the first n elements on a dataset as an array 6.top (N): Returns the first n elements by default or by a specified collation, which is output by default in descending order 7.takeOrdered (n,[ordering]): Returns the first n elements by natural order or by a specified collation Example 1:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24

def main (args:array[string]) { val conf = new sparkconf (). Setmaster ("local"). Setappname ("Red UCE ") val sc = new Sparkcontext (conf) val rdd = sc.parallelize (1 to 10 , 2) val Reducerdd = Rdd.reduce (_ + _) val reduceRDD1 = Rdd.reduce (_- _)//If partition data is 1 results for -53 val countrdd = Rdd.count () val Firstrdd = rdd.firs T () val takerdd = Rdd.take (5) //Output previous element val Toprdd = Rdd.top (3) //from high to the bottom output first three elements val Takeorderedrdd = rdd.takeordered (3) //in natural order from bottom to high output first three elements println ("Func +:" +reducerdd) &NBSP;&NBSP;&N bsp; println ("func-:" +reducerdd1) println ("Count:" +countrdd) &nbs P println ("First:" +FIRSTRDD) println ("Take:") takerdd.foreach (x = print (x + "")) &NBSP;&NB sp; println ("\ntop:") toprdd.foreach (x = print (x + "")) &NBSP;&NBSP;&NBSP;&NB Sp println ("\ntakeordered:") takeorderedrdd.foreach (x = print (x + "")) &NBSP;&NBSP;&NBSP;&N Bsp Sc.stop }

Output:

Func +: The
func-: 15//If the partition data is 1 The result is -53
count:10
first:1 take
:
1 2 3 4 5
Top:
9 8
Tak Eordered:
1 2 3

(Rdd dependency graph: The red block represents an rdd area, the black block represents the partition collection, the same as the same) (Rdd dependency graph) 8.countByKey (): on an RDD of the k-v type, count the number of each key, return (number of k,k) 9.collectAsMap (): on the k-v type of RDD, The function differs from collect in that the COLLECTASMAP function does not contain duplicate keys, for duplicate keys. Subsequent elements overwrite the preceding element 10.lookup (k) : On an rdd that acts on the K-v type, returns all of the V-value examples of the specified K 2:

1 2 3 4 5 6 7 8 9 Ten page UP

def main (args:array[string]) { val conf = new sparkconf (). Setmaster ("local"). Setappname ("Kvfunc" ) val sc = new Sparkcontext (conf) val arr = List (("A", 1), ("B", 2), ("A", 2 ), ("B", 3)) val rdd = Sc.parallelize (arr, 2) val Countbykeyrdd = Rdd.countbyke Y () val collectasmaprdd = Rdd.collectasmap () println ("Countbykey:") Countbykeyrdd.foreach (print) println ("\ncollectasmap:") Collectasmaprdd.foreach (print) sc.stop }

output:
Countbykey: (b,2) (a,2) Collectasmap: (a,2) (b,3)
(Rdd dependency graph) 11.aggregate (Zerovalue:u) (Seqop: (u,t) = U,comop (u,u) + U): The seqop function aggregates the data for each partition into a value of type U, and the COMOP function aggregates the U-type data of each partition to get a value of type U

1 2 3 4 5 6 7 8

def main (Args:array[string]) {val conf = new sparkconf (). Setmaster ("local"). Setappname ("Fold") val sc = NE W sparkcontext (conf) val Rdd = Sc.parallelize (List (1, 2, 3, 4), 2) Val Aggregaterdd = Rdd.aggregate (2) (_ +_,_ * _) println (Aggregaterdd) Sc.stop}

Output:

Step 1: Partition 1:zerovalue+1+2=5 partition 2:zerovalue+3+4=9 step 2:zerovalue* result of partition 1 * result of partition 2 =90 (Rdd dependency graph) 12.fold (Zerovalue:t) (OP: (t,t) = T): The uses the OP function to aggregate the elements in each partition and merge the elements of each partition, and the OP function requires two parameters, at the beginning the first passed parameter is zerovalue,t as the data type of the RDD dataset, The function is equivalent to the SEQOP and COMOP functions are the same aggregate function Example 3

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

Related Keywords:

action con action counter xmovies8 action input action action shoes php action redbox action

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

What's Trending

Top 10 Tags

datastax versions naming convention zookeeper client class definition md5 microsoft sql server 2005 data structures exception handling error handling

Top 10 Keywords

microsoft download center down wordpress address url site address url wordpress address url windows installer 4 0 download 302 not found web address url definition site address url wordpress db2 integer mac os installation step by step pdf abbreviation for return

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Action action by Spark common functions

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support