Experiment two-2 eclipse&hadoop do the English word frequency statistic to carry on the cluster test

Last Update:2015-06-15 Source: Internet

Author: User

Tags hadoop fs log4j

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Create a catalog upload English test documents (without configuration if they are already available).
Create input directory on A.dfs
[email protected]: ~/data/hadoop-2.5.2$bin/hadoop fs-mkdir-p input
B. Copy the README.txt from the Hadoop directory into DFS new input
[email protected]: ~/data/hadoop-2.5.2$bin/hadoop fs-copyfromlocal README.txt input

—————————————————————————————————

Note: The concrete classes of method one and method two can be skipped over. But the middle of the red word to see

————————————————————————————————

Method One:

Create a map/reduce Project
1) New Item File--new--other--map/reduce project named MR1 (when you create this step, you can see that the libraries in Hadoop are automatically added)

2) Create class Org.apache.hadoop.examples.WordCount, copy and paste from HADOOP-2.5.2-SRC
(E:\hadoop\hadoop-2.5.2-src\hadoop-mapreduce-project\hadoop-mapreduce-examples\src\main\java\org\apache\hadoop \examples\wordcount.java)

3) Then create the class Org.apache.hadoop.io.nativeio.NativeIO, copy and paste from the HADOOP-2.5.2-SRC
(E:\hadoop\hadoop-2.5.2-src\hadoop-common-project\hadoop-common\src\main\java\org\apache\hadoop\io\nativeio \ Nativeio.java)

The following two steps can not be done:
4) Create the Log4j.properties file (This step can not do)
Create the log4j.properties file in the SRC directory, as follows:
log4j.rootlogger=debug,stdout,r
log4j.appender.stdout=org.apache.log4j.consoleappender
log4j.appender.stdout.layout=org.apache.log4j.patternlayout
log4j.appender.stdout.layout.conversionpattern=%5p-%m%n
Log4j.appender.r= org.apache.log4j.rollingfileappender
log4j.appender.r.file=mapreduce_test.log
log4j.appender.r.maxfilesize=1mb
log4j.appender.r.maxbackupindex=1
log4j.appender.R.layout= org.apache.log4j.patternlayout
log4j.appender.r.layout.conversionpattern=%p%t%c-%m%n
log4j.logger.com.codefutures=debug
5) Resolve Java.lang.UnsatisfiedLinkError (This step I did not do):

ORG.APACHE.HADOOP.IO.NATIVEIO.NATIVEIO$WINDOWS.ACCESS0 (ljava/lang/string;i) Exception problem
(Because your environment and I may not be consistent, you can have the following issues after the change)
Copy source files Org.apache.hadoop.io.nativeio.NativeIO to Project
Then position to 570 lines, directly modify to return true;
As shown in the following:

3. Under Windows Runtime Environment configuration (if not in effect, you need to restart the machine)
Requires Hadoop.dll,winutils.exe (these two files are in hadoop2.5.2 (x64). zip). Copy the E:\hadoop\bin directory and add the two files to E:\hadoop\hadoop-2.5.2\bin.
4. Run Project
In Eclipse, click Wordcount.java, right-click Run As->run configurations, configure the run parameters, i.e. input and output folders
Hdfs://192.168.0.6:9000/input Hdfs://192.168.0.6:9000/output2
As shown in the following:

Note: When you run another project in the future, after you open the dialog box above, you need to right-create a new configuration on the Java application.

Run running, you can see the output folder appears in the HDFs file system after running, and you can see the results of the run part-r-00000 below the folder. Double-click to open the file as shown in:

Note: if Output The directory already exists, then delete or change a name, as output01 , output02 .....
In addition, there are problems can read more log (http://192.168.0.6:8088/logs/)

method Two: concrete See http://www.2cto.com/kf/201212/173857.html

This method specifies the input/output path in the program, so you do not need to specify an input/output path when running on Eclipse, as follows.

packaged into Jar

selected src Package Right-- Export -- Java -- JAR File -- Next --Select left only src the folder is ready, Lib under the Jar is a Hadoop bring it to your own, you don't need to add it to the Jar in the file (the following Cook looked through Chinese participle needs to be noted, Cook looked through's Jar is not Hadoop comes with, so need to add to Jar file), take care not to put the right side . Classpath and the . Project file is added to Jar file. The following jar file is then selected to store the path and file name of the jar package. such as:

In the%hadoop_home (i.e./home/hadoop/hadoop-2.5.2) code that uploads the generated. jar file to the master node, see experiment One

cd/home/hadoop/hadoop-2.5.2

Bin/hadoop jar Mywordcount.jar Mywordcount.mydriver

( because the input and output of the file have been specified in the program, the command does not need to be specified at this time, even if the designation is useless )

Error, because the program has just been executed in eclipse, the output directory has been generated, delete.

You can view the results on the eclipse side.

Attention::

Ditto, if you have already specified the main class at the time of packaging, do not write the main class when you throw it to Hadoop when you pack it up, or the program might treat the main class as the input path and the input path as the output path. The main class was written with every error because the main class was not specified when it was packaged.

——————————————————————————————————

The next method is the method of our book, need to see.

——————————————————————————————————

Method Three:

This method I have experienced a variety of failures after the re-do, at this time, the code in eclipse has been changed to UTF-8 (originally did not change the comments after the change into garbled, the reason is unclear)

Create a map/reduce Project , named Twowordcount .
Create a package under the SRC directory , named twowordcount.
Create three files in Twowordcount, respectively, in the mapper class,reducer class, and Driver class

4. Modify the code in conjunction with the textbook as follows:

Mymapper.java

Package twowordcount;

Import java.io.IOException;

Import Java.util.StringTokenizer;

Import org.apache.hadoop.io.IntWritable;

Import org.apache.hadoop.io.LongWritable;

Import Org.apache.hadoop.io.Text;

Import Org.apache.hadoop.mapreduce.Mapper;

public class Mymapper extends mapper<longwritable, text, text, intwritable> {

Private final static intwritable one = new intwritable (1);

Private text Word = new text ();

public void Map (longwritable ikey, Text value, context context)

Throws IOException, Interruptedexception {

* Parse the string into Key-value form

* Ikey Offset

* Value Content

* Context Contexts

StringTokenizer tokenizer = new StringTokenizer (value.tostring ());

while (Tokenizer.hasmoretokens ()) {

Word.set (Tokenizer.nexttoken ());

Context.write (Word, one);

}

Myreducer.java

Package twowordcount;

Import java.io.IOException;

Import org.apache.hadoop.io.IntWritable;

Import Org.apache.hadoop.io.Text;

Import Org.apache.hadoop.mapreduce.Reducer;

public class Myreducer extends Reducer<text, intwritable, Text, intwritable> {

Private intwritable result = new intwritable ();

public void reduce (Text _key, iterable<intwritable> values, context context)

Throws IOException, Interruptedexception {

* Get the Key-value result of the map method, the same key is sent to the same reducer inside, iterate key, add value, write the result to the HDFS system

* _key The key value of the map port output

* Value Collection (a collection of the same key value) from the values map side output

* Context Reducer Side

int sum=0;

for (intwritable val:values) {

Sum=sum+val.get ();

}

Result.set (sum);

Context.write (_key,result);

}

Mydriver.java

Package twowordcount;

Import org.apache.hadoop.conf.Configuration;

Import Org.apache.hadoop.fs.Path;

Import org.apache.hadoop.io.IntWritable;

Import Org.apache.hadoop.io.Text;

Import Org.apache.hadoop.mapreduce.Job;

Import Org.apache.hadoop.mapreduce.lib.input.FileInputFormat;

Import Org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

Import Org.apache.hadoop.util.GenericOptionsParser;

public class Mydriver {

public static void Main (string[] args) throws Exception {

Configuration conf = new configuration ();

string[] Otherargs = new Genericoptionsparser (conf, args). Getremainingargs ();

There must be input and output here.

if (otherargs.length! = 2) {

System.err.println ("Usage:wordcount <in> <out>");

System.exit (2);

// }

Configuration conf = new configuration ();

string[] Otherargs = new Genericoptionsparser (conf, args). Getremainingargs ();

if (Otherargs.length < 2) {

System.err.println ("Usage:wordcount <in> [<in> ...] <out> ");

System.exit (2);

}

Job Job = job.getinstance (conf, "Englishwordcount");

Job.setjarbyclass (TwoWordCount.myDriver.class);

Todo:specify a Mapper

Job.setmapperclass (TwoWordCount.myMapper.class); Mapper

Job.setcombinerclass (TwoWordCount.myReducer.class);//Job Synthesis class

Todo:specify a Reducer

Job.setreducerclass (twoWordCount.myReducer.class);//reducer

Todo:specify Output Types

Job.setoutputkeyclass (Text.class); Set key classes for job output data

Job.setoutputvalueclass (Intwritable.class); Set job output value class

Todo:specify Input and output directories (not files)

Fileinputformat.setinputpaths (Job, New Path (Otherargs[0])); File input

Fileoutputformat.setoutputpath (Job, New Path (Otherargs[1])); File output

if (!job.waitforcompletion (TRUE))//wait for the output to complete

Return

for (int i = 0; i < otherargs.length-1; ++i) {

Fileinputformat.addinputpath (Job, New Path (Otherargs[i]));

}

Fileoutputformat.setoutputpath (Job,

New Path (Otherargs[otherargs.length-1]);

System.exit (Job.waitforcompletion (True)? 0:1);

}

Note: In the code can also, with no annotated code can also. The code for the comment is used when the error is found, and the problem that is encountered in the following is said (question one).

5. Then run on Eclipse and Hadoop, respectively. It is important to note that when running on eclipse, the input and output path is set in the same way as method two, when running on Hadoop, it is important to specify whether the main program is specified in the package, and if so, do not specify the command, otherwise there will be a variety of errors, specific errors can be seen in the following problems.

This completes the test.

Experiment two-2 eclipse&hadoop do the English word frequency statistic to carry on the cluster test

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More