Map Operator of Spark RDD Operation

Source: Internet
Author: User
Keywords spark spark rdd spark rdd operation
This blog will introduce the basic usage of Spark RDD Map operator.

    1. map

    Map passes the elements of the RDD one by one to the call method. After the call method is calculated, it returns one by one to generate a new RDD. After the calculation, the number of records will not be reduced. Sample code, print out after adding 10 to each number, the code is as follows

import java.util.Arrays;

import org.apache.spark.SparkConf;
import org.apache.spark.api.java.JavaRDD;
import org.apache.spark.api.java.JavaSparkContext;
import org.apache.spark.api.java.function.Function;
import org.apache.spark.api.java.function.VoidFunction;

public class Map {
public static void main(String[] args) {
SparkConf conf = new SparkConf().setAppName("spark map").setMaster("local[*]");
JavaSparkContext javaSparkContext = new JavaSparkContext(conf);
JavaRDD<Integer> listRDD = javaSparkContext.parallelize(Arrays.asList(1, 2, 3, 4));

JavaRDD<Integer> numRDD = listRDD.map(new Function<Integer, Integer>() {
@Override
public Integer call(Integer num) throws Exception {
return num + 10;
}
});
numRDD.foreach(new VoidFunction<Integer>() {
@Override
public void call(Integer num) throws Exception {
System.out.println(num);
}
});
}

}
   




    2. flatMap

    FlatMap and map are processed in the same way, all the elements of the original RDD are passed in for calculation, but the difference is that the return value of flatMap is an Iterator, that is, it will be more than a lifetime, superb

import java.util.Arrays;
import java.util.Iterator;

import org.apache.spark.SparkConf;
import org.apache.spark.api.java.JavaRDD;
import org.apache.spark.api.java.JavaSparkContext;
import org.apache.spark.api.java.function.FlatMapFunction;
import org.apache.spark.api.java.function.VoidFunction;

public class FlatMap {
public static void main(String[] args) {
SparkConf conf = new SparkConf().setAppName("spark map").setMaster("local[*]");
JavaSparkContext javaSparkContext = new JavaSparkContext(conf);
JavaRDD<String> listRDD = javaSparkContext
.parallelize(Arrays.asList("hello wold", "hello java", "hello spark"));
JavaRDD<String> rdd = listRDD.flatMap(new FlatMapFunction<String, String>() {
private static final long serialVersionUID = 1L;

@Override
public Iterator<String> call(String input) throws Exception {
return Arrays.asList(input.split(" ")).iterator();
}
});
rdd.foreach(new VoidFunction<String>() {
private static final long serialVersionUID = 1L;
@Override
public void call(String num) throws Exception {
System.out.println(num);
}
});
}

}

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.