Spark transformation with Action action function

Source: Internet
Author: User

First, transformation

Map (func) returns a new distributed dataset consisting of the new elements that each original element has been treated with a function

Filter (func) returns a new dataset consisting of the original elements that return a value of true after the fun function is processed

FlatMap (func) is similar to map, but each INPUT element is mapped to 0 or more output elements

Mappartitions (func) is similar to map and works on each partition of the RDD

Intersection (Otherdataset) to find the intersection of two Rdd

Distinct ([numtasks]) returns a new dataset containing all the distinct elements in the source dataset

Groupbykey ([Numtasks]) is called on a data set composed of (k,v) pairs, and returns a (K,seq[v]) pair of data sets.

Reducebykey (Func,[numtasks]) is called on one (k,v) pair of data sets, returning a (k,v) pair of datasets

Sortbykey ([Ascending],[numtasks]) is called on a dataset of type (K,V) and returns a (K,V) pair of datasets sorted with the key K.

Second, action operation

Reduce (func) aggregates all elements in the result set by function func

Collect () returns all the data in the dataset as an array in the driver program.

COUNT () returns the number of elements

foreach (func) runs the function func on each element of the dataset, typically to update an accumulator variable, or to interact with an external storage system.

When performing a transformation operation, Spark does not start the calculation, but encapsulates the performed task into a DAG until the cluster is actually committed when the action action is encountered.

Spark transformation with Action action function

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.