"Source" self-learning from zero Hadoop (08): First MapReduce

Source: Internet
Author: User

Read Catalogue
    • Order
    • Data preparation
    • WordCount
    • Yarn
    • New MapReduce
    • Sample Download
    • Series Index

This article is copyright Mephisto and Blog Park is shared, welcome reprint, but must retain this paragraph statement, and give the original link, thank you for your cooperation.

The article is written by elder brother (Mephisto), Sourcelink

Order

On an article, our Eclipse plugin was done, and that started our MapReduce tour.

Here, we first call the official WordCount example, and then manually create an example, so that you can better understand the job.

Data preparation
One: Description

WordCount This class is the number of different word statistics, so we have to prepare the data here, of course, do not need a large amount of data, after all, it is to do their own experiments, right.

Second: Build data

Open Notepad, enter various word, have the same, different. Then save as Words_01.txt.

Three: Upload

Open Eclipse and upload the data source we prepared in DFS location to Tmp/input.

So our data is ready.

WordCount
One: Official website example

WordCount is a sample of Hadoop's official website, packaged in Hadoop-mapreduce-examples-<ver>.jar.

Address of the 2.7.1 version: Http://hadoop.apache.org/docs/r2.7.1/hadoop-mapreduce-client/hadoop-mapreduce-client-core/ Mapreducetutorial.html

Two: Find an example

We see two places in the results, so find a little closer.

Find /-name *hadoop-mapreduce-examples*

Four: Enter the catalogue

We chose to enter this example below/usr/hdp/.

cd/usr/hdp/2.3. 0.0-2557/hadoop-mapreduce
V: implementation

We'll start with the Hadoop jar command.

Command description: Hadoop jar Package Name method input File/directory output directory

#切换用户 su hsfs# executes Hadoop jar Hadoop-mapreduce-examples-2.7. 1.2. 3.0. 0-2557. Jar wordcount/tmp/input/words_01.txt/tmp/output/1007_01

Command execution results

Plugin results

Job Page Results

So that our first job is done smoothly.

Yarn
One: Introduction

hadoop2.x and hadoop1.x have two of the biggest changes, but also fundamental changes.

One is Namenode's single point problem solving, and then yarn is introduced. Here we will not do the unfolding of the talk, the following will be arranged chapters to tell.

Two: Yarn command

If we look closely, we can see that after the Hadoop jar command executes, there is a warning.

Yarn jar hadoop-mapreduce-examples-2.7. 1.2. 3.0. 0-2557. Jar wordcount/tmp/input/words_01.txt/tmp/output/1007_02

New MapReduce
One: New project via Plug-in

Here is not known, in the previous article we built a project through the plug-in, we directly use the project "Com.first".

II: New Wordcountex class

This is our custom wordcount class, the example of the official website to write, do a little DIY, convenient for everyone to understand.

Upon completion

Three: New mapper

An inner class mymapper is built in the Wordcountex class.

Here we do a little DIY, excluding the letter length less than 5 of the data, easy to understand the process of comparison.

Static classMymapperextendsMapper<object, text, text, intwritable> {        Private Final StaticIntwritable one =NewIntwritable (1); PrivateText Word =NewText (); @Overrideprotected voidMap (Object key, text value, Mapper<object, text, text, intwritable>. Context context)throwsIOException, interruptedexception {//Split StringStringTokenizer ITR =NewStringTokenizer (value.tostring ());  while(Itr.hasmoretokens ()) {//excludes letters with fewer than 5 charactersString tmp =Itr.nexttoken (); if(Tmp.length () < 5)                    Continue;                Word.set (TMP);            Context.write (Word, one); }        }    }
View CodeFour: New reduce

Ditto, we multiply the result of the map by 2, and then the output of the key is prefixed.

Static classMyreduceextendsReducer<text, Intwritable, Text, intwritable> {        Privateintwritable result =Newintwritable (); PrivateText Keyex =NewText (); @Overrideprotected voidReduce (Text key, iterable<intwritable>values, Reducer<text, Intwritable, Text, intwritable>. Context context)throwsIOException, interruptedexception {intsum = 0;  for(intwritable val:values) {//enlarge The result of the map by multiplying it by 2Sum + = Val.get () * 2;            } result.set (sum); //Custom Output KeyKeyex.set ("Output:" +key.tostring ());        Context.write (Keyex, result); }    }
View CodeFive: New Main

In the main method we have to define a job and configure it.

     Public Static voidMain (string[] args)throwsException {//configuration InformationConfiguration conf =NewConfiguration (); //Job nameJob Job = job.getinstance (conf, "Mywordcount")); Job.setjarbyclass (Wordcountex.class); Job.setmapperclass (mymapper.class); //Job.setcombinerclass (intsumreducer.class);Job.setreducerclass (myreduce.class); Job.setoutputkeyclass (Text.class); Job.setoutputvalueclass (intwritable.class); //input, Output pathFileinputformat.addinputpath (Job,NewPath (args[0])); Fileoutputformat.setoutputpath (Job,NewPath (args[1])); //EndSystem.exit (Job.waitforcompletion (true) ? 0:1); }
View CodeVI: Export JAR Package

Export our well-written jar package. Named Com.first.jar

Seven: Put Linux

Place the exported jar package under the/var/tmp of H31

Cd/var/tmp
Ls
VIII: implementation

Look at the orders and the results, we'll find out what's different.

Yarn jar Com.first.jar  /tmp/input/words_01.txt/tmp/output/1007_03

If you look closely, find that there are fewer wordcount, why columns, because the main function that was developed when exporting the jar package.

IX: Export a jar package that does not specify a main entry

When we export, we don't specify the entry of main.

Ten: Execution 2

We find that we have to take one more parameter here, which is the entrance to the method, where the full path is.

Yarn Jar Com.first.jar Com.first.wordcountex/tmp/input/words_01.txt/tmp/output/1007_04

11: Results

As we look at the output, we can see clearly that less than 5 lengths are excluded, and the count of the results is multiplied by 2. The prefix is garbled, do not tangle, change the encoding method is good.

--------------------------------------------------------------------

Here, the content of this chapter is complete.

Sample Download

Github:https://github.com/sinodzh/hadoopexample/tree/master/2015/com.first

Series Index

"Source" Self-Learning Hadoop series index from zero

This article is copyright Mephisto and Blog Park is shared, welcome reprint, but must retain this paragraph statement, and give the original link, thank you for your cooperation.

The article is written by elder brother (Mephisto), Sourcelink

"Source" self-learning from zero Hadoop (08): First MapReduce

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.