Read Catalogue
- Order
- Data preparation
- WordCount
- Yarn
- New MapReduce
- Sample Download
- Series Index
This article is copyright Mephisto and Blog Park is shared, welcome reprint, but must retain this paragraph statement, and give the original link, thank you for your cooperation.
The article is written by elder brother (Mephisto), Sourcelink
Order
On an article, our Eclipse plugin was done, and that started our MapReduce tour.
Here, we first call the official WordCount example, and then manually create an example, so that you can better understand the job.
Data preparation
One: Description
WordCount This class is the number of different word statistics, so we have to prepare the data here, of course, do not need a large amount of data, after all, it is to do their own experiments, right.
Second: Build data
Open Notepad, enter various word, have the same, different. Then save as Words_01.txt.
Three: Upload
Open Eclipse and upload the data source we prepared in DFS location to Tmp/input.
So our data is ready.
WordCount
One: Official website example
WordCount is a sample of Hadoop's official website, packaged in Hadoop-mapreduce-examples-<ver>.jar.
Address of the 2.7.1 version: Http://hadoop.apache.org/docs/r2.7.1/hadoop-mapreduce-client/hadoop-mapreduce-client-core/ Mapreducetutorial.html
Two: Find an example
We see two places in the results, so find a little closer.
Find /-name *hadoop-mapreduce-examples*
Four: Enter the catalogue
We chose to enter this example below/usr/hdp/.
cd/usr/hdp/2.3. 0.0-2557/hadoop-mapreduce
V: implementation
We'll start with the Hadoop jar command.
Command description: Hadoop jar Package Name method input File/directory output directory
#切换用户 su hsfs# executes Hadoop jar Hadoop-mapreduce-examples-2.7. 1.2. 3.0. 0-2557. Jar wordcount/tmp/input/words_01.txt/tmp/output/1007_01
Command execution results
Plugin results
Job Page Results
So that our first job is done smoothly.
Yarn
One: Introduction
hadoop2.x and hadoop1.x have two of the biggest changes, but also fundamental changes.
One is Namenode's single point problem solving, and then yarn is introduced. Here we will not do the unfolding of the talk, the following will be arranged chapters to tell.
Two: Yarn command
If we look closely, we can see that after the Hadoop jar command executes, there is a warning.
Yarn jar hadoop-mapreduce-examples-2.7. 1.2. 3.0. 0-2557. Jar wordcount/tmp/input/words_01.txt/tmp/output/1007_02
New MapReduce
One: New project via Plug-in
Here is not known, in the previous article we built a project through the plug-in, we directly use the project "Com.first".
II: New Wordcountex class
This is our custom wordcount class, the example of the official website to write, do a little DIY, convenient for everyone to understand.
Upon completion
Three: New mapper
An inner class mymapper is built in the Wordcountex class.
Here we do a little DIY, excluding the letter length less than 5 of the data, easy to understand the process of comparison.
Static classMymapperextendsMapper<object, text, text, intwritable> { Private Final StaticIntwritable one =NewIntwritable (1); PrivateText Word =NewText (); @Overrideprotected voidMap (Object key, text value, Mapper<object, text, text, intwritable>. Context context)throwsIOException, interruptedexception {//Split StringStringTokenizer ITR =NewStringTokenizer (value.tostring ()); while(Itr.hasmoretokens ()) {//excludes letters with fewer than 5 charactersString tmp =Itr.nexttoken (); if(Tmp.length () < 5) Continue; Word.set (TMP); Context.write (Word, one); } } }
View CodeFour: New reduce
Ditto, we multiply the result of the map by 2, and then the output of the key is prefixed.
Static classMyreduceextendsReducer<text, Intwritable, Text, intwritable> { Privateintwritable result =Newintwritable (); PrivateText Keyex =NewText (); @Overrideprotected voidReduce (Text key, iterable<intwritable>values, Reducer<text, Intwritable, Text, intwritable>. Context context)throwsIOException, interruptedexception {intsum = 0; for(intwritable val:values) {//enlarge The result of the map by multiplying it by 2Sum + = Val.get () * 2; } result.set (sum); //Custom Output KeyKeyex.set ("Output:" +key.tostring ()); Context.write (Keyex, result); } }
View CodeFive: New Main
In the main method we have to define a job and configure it.
Public Static voidMain (string[] args)throwsException {//configuration InformationConfiguration conf =NewConfiguration (); //Job nameJob Job = job.getinstance (conf, "Mywordcount")); Job.setjarbyclass (Wordcountex.class); Job.setmapperclass (mymapper.class); //Job.setcombinerclass (intsumreducer.class);Job.setreducerclass (myreduce.class); Job.setoutputkeyclass (Text.class); Job.setoutputvalueclass (intwritable.class); //input, Output pathFileinputformat.addinputpath (Job,NewPath (args[0])); Fileoutputformat.setoutputpath (Job,NewPath (args[1])); //EndSystem.exit (Job.waitforcompletion (true) ? 0:1); }
View CodeVI: Export JAR Package
Export our well-written jar package. Named Com.first.jar
Seven: Put Linux
Place the exported jar package under the/var/tmp of H31
Cd/var/tmp
Ls
VIII: implementation
Look at the orders and the results, we'll find out what's different.
Yarn jar Com.first.jar /tmp/input/words_01.txt/tmp/output/1007_03
If you look closely, find that there are fewer wordcount, why columns, because the main function that was developed when exporting the jar package.
IX: Export a jar package that does not specify a main entry
When we export, we don't specify the entry of main.
Ten: Execution 2
We find that we have to take one more parameter here, which is the entrance to the method, where the full path is.
Yarn Jar Com.first.jar Com.first.wordcountex/tmp/input/words_01.txt/tmp/output/1007_04
11: Results
As we look at the output, we can see clearly that less than 5 lengths are excluded, and the count of the results is multiplied by 2. The prefix is garbled, do not tangle, change the encoding method is good.
--------------------------------------------------------------------
Here, the content of this chapter is complete.
Sample Download
Github:https://github.com/sinodzh/hadoopexample/tree/master/2015/com.first
Series Index
"Source" Self-Learning Hadoop series index from zero
This article is copyright Mephisto and Blog Park is shared, welcome reprint, but must retain this paragraph statement, and give the original link, thank you for your cooperation.
The article is written by elder brother (Mephisto), Sourcelink
"Source" self-learning from zero Hadoop (08): First MapReduce