Research background:
The blogger has just contacted Spark development, the API is not particularly familiar with the above mentioned 4 kinds of APIs are often unclear usage, so write this article as a reference.
If there is a different opinion, I hope to be enthusiastic message ~ ~ ~
The main test scenario is to imitate the words in the statement to slice ~. (Word segmentation according to the space, frequency statistics of the previous step.)
Maven dependencies:
<dependency>
<groupId>org.apache.spark</groupId>
<artifactid>spark-core_2.11 </artifactId>
<version>2.2.0</version>
</dependency>
Method Introduction
MAP: (not recommended)
The map function specifies the operation for each input, and then returns an object for each input.
Example code:
javardd<string[]> Mapresult = Linesrdd.map (New function<string, string[]> () {
@Override
public String[] Call (String s) throws Exception {
return S.split ("");
}
);
Mappartition:
The Mappartition function will operate on a set of data in each partition and eventually return an iterator to the specified object.
We recommend that you use Mappartition instead of the MAP function:
The reasons are as follows:
Advantage 1:
For some initialization operations, the use of the map function may require one call to each data, while using Mappartition can only invoke one initialization operation per partition, and resource usage is more efficient.
Advantage 2:
The mappartition can be very convenient to filter the return results (such as bad data filtering), map is more difficult to achieve.
Mappartition can also accomplish FlatMap similar functions (but the bottom implementation principle may not be the same), see the following article
Example code:
javardd<string[]> Mappartitionsresult = linesrdd.mappartitions (New flatmapfunction<iterator<string> , string[]> () {
@Override public
iterator<string[]> Call (iterator<string> stringiterator) Throws Exception {
list<string[]> Resultarr = new arraylist<> ();
while (Stringiterator.hasnext ()) {
String line = Stringiterator.next ();
string[] Tmpresult = Line.split ("");
Resultarr.add (Tmpresult);
}
return Resultarr.iterator ();
}
});
FLATMAP:
The Flatmap function is a set of two operations--it is "flattened after first mapping":
Action 1: The same as the map function: Specify the operation for each input and return an object for each input
Action 2: Finally merge all objects into one object
the main differences between FlatMap and Map:
The main conversion of a Map is a piece of data that returns a single piece of data
FlatMap Converts a piece of data into a set of data (iterators) that are primarily used to convert a record to multiple records, such as slicing the words in each line of article ,
returns all the words in each row.
Example code:
javardd<string> Flatmapresult = Linesrdd.flatmap (New flatmapfunction<string, String> () {
@Override Public
iterator<string> Call (String s) throws Exception {
return arrays.aslist (S.split ("")). Iterator () ;
}
});
In fact: According to my understanding, mappartition can also complete FlatMap similar functions; (but the bottom of the implementation may not be the same principle)
Assume that the function is to slice the statement by "" (a space) and return a list of words:
SYSTEM.OUT.PRINTLN ("Mappartition simulate FlatMap operation"); javardd<string> Mappartitionlikeflatmapresult = linesrdd.mappartitions (New Flatmapfunction<itera Tor<string>, string> () {@Override public iterator<string> call (Ite Rator<string> stringiterator) throws Exception {list<string> resultlist = new Arrayli
St<> ();
while (Stringiterator.hasnext ()) {String tmpline = Stringiterator.next ();
string[] Tmpwords = Tmpline.split ("");
for (String tmpstring:tmpwords) {resultlist.add (tmpstring);
}} return Resultlist.iterator ();
}
}
); Mappartitionlikeflatmapresult.foreach (New VOIDFUNCTION<STRING≫ () {@Override public void call (String s) throws Exception {System.out.println
(s);
}
}); System.out.println ("\ n");
Flatmaptopair:
Flatmaptopair actually converts the returned data to 1 tuple, or key-value formatted data, based on the Flatmap function. Convenient for the same key data for subsequent statistics such as statistics and other operations.
Example code:
javapairrdd<string, integer> flatmaptopairresult = Linesrdd.flatmaptopair (
new pairflatmapfunction< String, String, integer> () {
@Override public
iterator<tuple2<string, integer>> Call (string s) Throws Exception {
list<tuple2<string, integer>> resulttuple = new arraylist<> ();
string[] Tmplist = S.split ("");
for (String tmpstring:tmplist) {
resulttuple.add (new Tuple2<> (tmpstring, 1));
}
return Resulttuple.iterator ();
}
});
The overall sample code:
Package com.spark.test.batch.job;
Import org.apache.spark.SparkConf;
Import Org.apache.spark.api.java.JavaPairRDD;
Import Org.apache.spark.api.java.JavaRDD;
Import Org.apache.spark.api.java.JavaSparkContext;
Import org.apache.spark.api.java.function.FlatMapFunction;
Import org.apache.spark.api.java.function.Function;
Import org.apache.spark.api.java.function.PairFlatMapFunction;
Import org.apache.spark.api.java.function.VoidFunction; Import Scala.
Tuple2;
Import java.util.ArrayList;
Import Java.util.Arrays;
Import Java.util.Iterator;
Import java.util.List;
/** * Created by Szh on 2018/5/2.
* * @author Szh * @date 2018/5/2 */public class Multimapcompare {public static void main (string[] args) {
sparkconf sparkconf = new sparkconf ();
Sparkconf.setappname ("Multimapcompare"). Setmaster ("local[2]");
Javasparkcontext sparkcontext = new Javasparkcontext (sparkconf);
Sparkcontext.setloglevel ("ERROR"); list<string> lineslist = new ArraYlist<> ();
Lineslist.add ("You Were a Bad man");
Lineslist.add ("Just a Test Job");
javardd<string> Linesrdd = sparkcontext.parallelize (lineslist);
SYSTEM.OUT.PRINTLN ("Map Result");
javardd<string[]> Mapresult = Linesrdd.map (New function<string, string[]> () {@Override
Public string[] Call (String s) throws Exception {return S.split ("");
}
}); Mapresult.foreach (New voidfunction<string[]> () {@Override public void call (string[] String
s) throws Exception {for (String tmp:strings) {SYSTEM.OUT.PRINTLN (TMP);
}
}
});
System.out.println ("\ n");
System.out.println ("Mappartitions Result"); javardd<string[]> Mappartitionsresult = linesrdd.mappartitions (New flatmapfunction<iterator<string>
, string[]> () { @Override Public iterator<string[]> Call (iterator<string> stringiterator) throws Exception {
list<string[]> Resultarr = new arraylist<> ();
while (Stringiterator.hasnext ()) {String line = Stringiterator.next ();
string[] Tmpresult = Line.split ("");
Resultarr.add (Tmpresult);
} return Resultarr.iterator ();
}
}); Mappartitionsresult.foreach (New voidfunction<string[]> () {@Override public void call (String
[] strings) throws Exception {for (String tmp:strings) {SYSTEM.OUT.PRINTLN (TMP);
}
}
});
System.out.println ("\ n");
System.out.println ("FlatMap Result"); javardd<string> Flatmapresult = Linesrdd.flatmap (New flatmapfunction<string, String> () {@OverRide public iterator<string> Call (String s) throws Exception {return arrays.aslist (s.sp
Lit ("")). iterator ();
}
}); Flatmapresult.foreach (New voidfunction<string> () {@Override public void call (String s) thr
oWS Exception {System.out.println (s);
}
});
System.out.println ("\ n");
System.out.println ("Flatmaptopair Result"); javapairrdd<string, integer> flatmaptopairresult = Linesrdd.flatmaptopair (New pairflatmapfunction& Lt String, String, integer> () {@Override public iterator<tuple2<string, Int Eger>> Call (String s) throws Exception {list<tuple2<string, integer>> Resulttu
ple = new arraylist<> ();
string[] Tmplist = S.split ("");
for (String tmpstring:tmplist) { Resulttuple.add (New Tuple2<> (tmpstring, 1));
} return Resulttuple.iterator ();
}
}); Flatmaptopairresult.foreach (New voidfunction<tuple2<string, integer>> () {@Override PU Blic void Call (tuple2<string, integer> stringIntegerTuple2) throws Exception {System.out.println (st
RINGINTEGERTUPLE2);
}
});
System.out.println ("\ n");
Sparkcontext.close (); }
}
========
Operation Result:
Map result You is a bad man
Just
a
test
job
mappartitions Result
Just
a
test job you were a bad man
flatMap Result
Just
A
Test
Job
You
is a bad man
flatmaptopair Result
(just,1)
(A, 1) (
test,1)
(job,1)
(you,1) (
are,1)
(a,1) (bad,1) (man,1)