Spark API Details/plain English interpretation map, mappartitions, Mapvalues, Mapwith, FlatMap, Flatmapwith, flatmapvalues

Source: Internet
Author: User
Tags foreach spark rdd

Map (function)
Map is the execution of a specified function on each element of the RDD to produce a new rdd. Any element in the original RDD is in the new RDD and has only one element corresponding to it.

Example:

Val A = sc.parallelize (1 to 9, 3)
val b = a.map (x = x*2)//x = = X*2 is a function, X is an incoming parameter, each element of the RDD, X*2 is the return value
a.collect
  //results Array[int] = Array (1, 2, 3, 4, 5, 6, 7, 8, 9)
B.collect
//result Array[int] = Array (2, 4, 6, 8, 10, 12, 14, 16, 18)

Of course, map can also turn key into Key-value.

Val A = Sc.parallelize (List ("Dog", "Tiger", "Lion", "cat", "Panther", "Eagle"), 2)
val b = a.map (x = (x, 1))
B.collect.foreach (println (_))/
*
(dog,1)
(tiger,1)
(lion,1
) (cat,1) (panther,1)
(eagle,1)
*/

Mappartitions (function)
The input function of map () is applied to each element in the RDD, and the input function of mappartitions () is applied to each partition

Package test

Import Scala. Iterator

Import org.apache.spark.SparkConf
import org.apache.spark.SparkContext

object Testrdd {
  def sumofeverypartition (Input:iterator[int]): Int = {
    var total = 0
    Input.foreach {elem
      + total + = elem< c9/>}
    Total
  }
  def main (args:array[string]) {
    val conf = new sparkconf (). Setappname ("Spark Rdd Test
    val spark = new Sparkcontext (conf)
    val input = Spark.parallelize (List (1, 2, 3, 4, 5, 6), 2)//rdd has 6 elements, divided into 2 par Tition
    val result = input.mappartitions (
      partition = Iterator (Sumofeverypartition (partition)))// Partition is an incoming parameter, is a list, and requires that the return is also a list, that is, iterator (Sumofeverypartition (partition))
    result.collect (). foreach {
      println (_)//6
    Spark.stop ()
  }
}

Mapvalues (function)
The key in the original RDD remains unchanged, together with the new value to form the elements in the new Rdd. Therefore, the function applies only to the RDD for which the element is KV.

Val A = Sc.parallelize (List ("Dog", "Tiger", "Lion", "cat", "Panther", "Eagle"), 2)
val b = a.map (x = (x.length, x ))
b.mapvalues ("x" + _ + "X"). Collect

"X" + _ + "x" Equals everyinput = "x" + Everyinput + "X"
Results
Array (
(3,XDOGX),
(5,xtigerx),
(4,xlionx),
(3,XCATX),
(7,xpantherx),
(5,xeaglex)
)

Mapwith and Flatmapwith
Not much to feel, refer to http://blog.csdn.net/jewes/article/details/39896301

FlatMap (function)
Similar to map, the difference is that elements in the original RDD can only generate one element after map processing, and elements in the original RDD can be flatmap processed to generate multiple elements

Val A = sc.parallelize (1 to 4, 2)
val b = a.flatmap (x = 1 to x)//Each element expands
b.collect/
*
results    Array[int] = Array (1, 
                           1, 2, 
                           1, 2, 3, 
                           1, 2, 3, 4)
*/

Flatmapvalues (function)

val a = Sc.parallelize (List (x), (3,4), (5,6))) Val B = A.flatmapvalues (x=>1 to X) b.collect . foreach (println (_))/* (3,1) (3,2) (3,3) (3,4) (5,1) (5,2) (5,3) (5,4) (5,5) (5,6) */

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.