This blog will introduce the basic usage of
Spark RDD Map operator.
1. map
Map passes the elements of the RDD one by one to the call method. After the call method is calculated, it returns one by one to generate a new RDD. After the calculation, the number of records will not be reduced. Sample code, print out after adding 10 to each number, the code is as follows
import java.util.Arrays;
import org.apache.spark.SparkConf;
import org.apache.spark.api.java.JavaRDD;
import org.apache.spark.api.java.JavaSparkContext;
import org.apache.spark.api.java.function.Function;
import org.apache.spark.api.java.function.VoidFunction;
public class Map {
public static void main(String[] args) {
SparkConf conf = new SparkConf().setAppName("spark map").setMaster("local[*]");
JavaSparkContext javaSparkContext = new JavaSparkContext(conf);
JavaRDD<Integer> listRDD = javaSparkContext.parallelize(Arrays.asList(1, 2, 3, 4));
JavaRDD<Integer> numRDD = listRDD.map(new Function<Integer, Integer>() {
@Override
public Integer call(Integer num) throws Exception {
return num + 10;
}
});
numRDD.foreach(new VoidFunction<Integer>() {
@Override
public void call(Integer num) throws Exception {
System.out.println(num);
}
});
}
}
2. flatMap
FlatMap and map are processed in the same way, all the elements of the original RDD are passed in for calculation, but the difference is that the return value of flatMap is an Iterator, that is, it will be more than a lifetime, superb
import java.util.Arrays;
import java.util.Iterator;
import org.apache.spark.SparkConf;
import org.apache.spark.api.java.JavaRDD;
import org.apache.spark.api.java.JavaSparkContext;
import org.apache.spark.api.java.function.FlatMapFunction;
import org.apache.spark.api.java.function.VoidFunction;
public class FlatMap {
public static void main(String[] args) {
SparkConf conf = new SparkConf().setAppName("spark map").setMaster("local[*]");
JavaSparkContext javaSparkContext = new JavaSparkContext(conf);
JavaRDD<String> listRDD = javaSparkContext
.parallelize(Arrays.asList("hello wold", "hello java", "hello spark"));
JavaRDD<String> rdd = listRDD.flatMap(new FlatMapFunction<String, String>() {
private static final long serialVersionUID = 1L;
@Override
public Iterator<String> call(String input) throws Exception {
return Arrays.asList(input.split(" ")).iterator();
}
});
rdd.foreach(new VoidFunction<String>() {
private static final long serialVersionUID = 1L;
@Override
public void call(String num) throws Exception {
System.out.println(num);
}
});
}
}
The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion;
products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the
content of the page makes you feel confusing, please write us an email, we will handle the problem
within 5 days after receiving your email.
If you find any instances of plagiarism from the community, please send an email to:
info-contact@alibabacloud.com
and provide relevant evidence. A staff member will contact you within 5 working days.