First, transformation
Map (func) returns a new distributed dataset consisting of the new elements that each original element has been treated with a function
Filter (func) returns a new dataset consisting of the original elements that return a value of true after the fun function is processed
FlatMap (func) is similar to map, but each INPUT element is mapped to 0 or more output elements
Mappartitions (func) is similar to map and works on each partition of the RDD
Intersection (Otherdataset) to find the intersection of two Rdd
Distinct ([numtasks]) returns a new dataset containing all the distinct elements in the source dataset
Groupbykey ([Numtasks]) is called on a data set composed of (k,v) pairs, and returns a (K,seq[v]) pair of data sets.
Reducebykey (Func,[numtasks]) is called on one (k,v) pair of data sets, returning a (k,v) pair of datasets
Sortbykey ([Ascending],[numtasks]) is called on a dataset of type (K,V) and returns a (K,V) pair of datasets sorted with the key K.
Second, action operation
Reduce (func) aggregates all elements in the result set by function func
Collect () returns all the data in the dataset as an array in the driver program.
COUNT () returns the number of elements
foreach (func) runs the function func on each element of the dataset, typically to update an accumulator variable, or to interact with an external storage system.
When performing a transformation operation, Spark does not start the calculation, but encapsulates the performed task into a DAG until the cluster is actually committed when the action action is encountered.
Spark transformation with Action action function