original articles, reproduced please specify: reprinted from the Engineering School hall 1th
Welcome to my personal blog: www.wuyudong.com, more great stories about cloud computing and big data
1. Install the Hadoop plugin and configure it in Eclise
In the previous article, "compiling the plugin for Hadoop Eclipse (hadoop1.0)", we've already covered how to compile hadoop1.0-based Eclipse plug-ins
Place the jar package under the Plugins folder in the Eclipse installation directory. Then start eclipse
After entering, open the settings under menu window->rreferences:
Click "Ant" to appear:
Click Browse to select the build directory under Hadoop source, then click OK
Open window->show View->other Select Map/reduce Tools, click Map/reduce Locations to open a View:
Add Hadoop loacation, where host and port contents here host and port correspond to Mapred.job.tracker values in Mapred-site.xml, UserName is the user name, I'm configuring localhost and 9001.
However, the Project Explorer is not visible on the left side of Eclipse, and you do not see the DFS
Workaround:
Should be in the menu bar
Select: Window->open pespective-><map/reduce>. You can then see some of the projects that the HDFs file system has created.
Add Hadoop loacation, where the contents of host and port are filled in with Conf/hadoop-site.xml configuration, UserName is the user name, such as
The following error may also occur after you successfully add Hadoop loacation:
Workaround:
At this point, the Namenode needs to be formatted: Bin/hadoop Namenode-format
Execute command: bin/start-all.sh
If the folder under test is displayed (1) instead of (2) It is also normal, if you want to display (2), run the last few commands in the article "Install and run Hadoop".
Once configured, in Project Explorer, you can browse to the files in DFS, expand at one level, see the In folder we uploaded before, and when it is 2 txt files, and see an Out folder after calculation.
Now we have to prepare ourselves to write a Hadoop program, so we have to remove this out folder, there are two ways, one can be in this tree, do right-click Delete. Second, you can use the command line:
$ bin/hadoop FS-RMR out
View with $bin/hadoop Fs-ls
2, write HelloWorld
Environment set up, before running Hadoop, directly using the examples in the sample program ran down, now you can write this HelloWorld. Under the Eclipse menu, new Project can see that the map/reduce option is added:
Select, click Next:
After entering the project name, continue (next), then click Finish
Then you can see the project in Project Explorer, expand, src found nothing inside, so right-side menu, New Class (New->new Class):
Then click Finish to see the creation of a Java class:
Then fill in the following code into this class:
public static class Tokenizermapper extends Mapper<object, text, text, intwritable>{private final S Tatic intwritable one = new intwritable (1); Private text Word = new text (); public void Map (Object key, Text value, Context context) throws IOException, Interruptedexception { StringTokenizer ITR = new StringTokenizer (value.tostring ()); while (Itr.hasmoretokens ()) {Word.set (Itr.nexttoken ()); Context.write (Word, one); }}} public static class Intsumreducer extends Reducer<text,intwritable,text,intwritable> {privat E intwritable result = new intwritable (); public void reduce (Text key, iterable<intwritable> values, context context ) throws IOException, interruptedexception {int sum = 0; for (intwritable val:values) {sum + = Val.get (); } result.set (sum); Context.write (key, result); }} publicstatic void Main (string[] args) throws Exception {configuration conf = new Configuration (); string[] Otherargs = new Genericoptionsparser (conf, args). Getremainingargs (); if (otherargs.length! = 2) {System.err.println ("Usage:wordcount <in> <out>"); System.exit (2); Job Job = new Job (conf, "word count"); Job.setjarbyclass (Wordcount.class); Job.setmapperclass (Tokenizermapper.class); Job.setcombinerclass (Intsumreducer.class); Job.setreducerclass (Intsumreducer.class); Job.setoutputkeyclass (Text.class); Job.setoutputvalueclass (Intwritable.class); Fileinputformat.addinputpath (Job, New Path (Otherargs[0])); Fileoutputformat.setoutputpath (Job, New Path (Otherargs[1])); System.exit (Job.waitforcompletion (true)? 0:1); }
After filling in the code, you will see some errors, no matter, click on the side of the Red Fork, and then select the inside of the import can:
Import Java.io.ioexception;import Java.util.stringtokenizer;import Org.apache.hadoop.conf.configuration;import Org.apache.hadoop.fs.path;import Org.apache.hadoop.io.intwritable;import Org.apache.hadoop.io.text;import Org.apache.hadoop.mapreduce.job;import Org.apache.hadoop.mapreduce.mapper;import Org.apache.hadoop.mapreduce.reducer;import Org.apache.hadoop.mapreduce.lib.input.fileinputformat;import Org.apache.hadoop.mapreduce.lib.output.fileoutputformat;import Org.apache.hadoop.util.GenericOptionsParser;
Here, if the direct use of the source code to operate, may be genericoptionsparser this class can not find the definition, or Red fork, add Commons-cli-1.2.jar this jar package, under Build/ivy/lib/hadoop/common, Myhelloworld project in right-Jian Project Explorer, select Build Path->config build Path
Under Liberaries tab, click Add External JARs in the pop-up window, according to the previous said directory, find this jar package, click OK, back to the project, you can see the Red Fork disappears, the compilation has passed.
After making sure the whole project is not wrong, click on the small green arrow above, then on the small pop-up window, select Run on Hadoop:
After you click OK, a small window will pop up:
Then choose choose an existing server from the list below. Then locate the previously configured address entry, select it, click Finish, and the system will not run, and you can see the results in the console (double-click to maximize):
After running, you can see an Out folder, double-click Open out file can see the results of the word
3. Problems that may arise:
Question 1:
After running, if the console only outputs usage:wordcount<in> <out>
You need to modify the following parameters, on the Run menu side of the small arrow, drop down, click Run Configuration,:
On the left, select WordCount in Javaapplication, on the right, and in arguments to enter in. Then click Run to see the results.
On the left, select WordCount in Javaapplication, on the right, and in arguments to enter in. Then click Run to see the results.
Question 2:
The second run will error, carefully read the hint, you can see the error is the out directory already exists, so need to manually delete it.
Further
Above we wrote a MapReduce HelloWorld program, and now we also learn to write the HDFS program. What HDFs is, it is a distributed file storage system. What are the common operations? Of course, we can start from a programming perspective: Create, read, write a file, list files and folders in a folder, delete a folder, delete a directory, move a file or folder, rename a file or folder.
Start Eclipse, create a new Hadoop project, name Myhdfstest, new class hdfstest, click OK, and then the same project properties configure BuildPath build/ivy/lib/ All jar packages under Hadoop are referenced (not detailed, refer to the steps above)
In the class, add the main function:
public static void Main (string[] args) {}
Alternatively, when you add a class, the check box on create main will be added automatically.
In the Mian function, add the following:
try { Configuration conf = new Configuration (); Conf.set ("Fs.default.name", "hdfs://localhost:9000"); FileSystem HDFs = filesystem.get (conf); Path PATH = new Path ("In/test3.txt"); Fsdataoutputstream outputstream = hdfs.create (path); byte[] buffer = "Hello". GetBytes (); Outputstream.write (buffer, 0, buffer.length); Outputstream.flush (); Outputstream.close (); System.out.println ("Create OK"); } catch (IOException e) { e.printstacktrace ();}
Add directly to the error, and then you need to add some references to line:
Import Org.apache.hadoop.conf.configuration;import Org.apache.hadoop.fs.fsdataoutputstream;import Org.apache.hadoop.fs.filesystem;import Org.apache.hadoop.fs.Path;
After you have no errors, click Run on the toolbar, but this time, choose Run as Java application. You can then see the word "create OK" in the output box, indicating that the program is running successfully.
This code means in the In folder, create Test3.txt, the content is "Hello". After running, we can see if this file and the content are available in the Project Explorer in Eclipse. You can also use the command line to view $bin/hadoop Fs-ls in.
OK, the first operation of the HDFS program run up, then the other functions as long as the corresponding processing class on the set can be. To facilitate the search operation, we have listed the table:
Operating instructions |
Manipulating Local Files |
Manipulating DFS Files |
Primary namespace |
Java. Io. File Java. Io. FileInputStream Java. Io. FileOutputStream |
Org. Apache. Hadoop. Conf. Configuration Org. Apache. Hadoop. FS. FileSystem Org. Apache. Hadoop. FS. Path Org. Apache. Hadoop. FS. Fsdatainputstream; Org. Apache. Hadoop. FS. Fsdataoutputstream |
The primary object |
New File (path); |
Configuration FileSystem HDFs |
Create a file |
File.createnewfile (); |
Fsdataoutputstream = hdfs.create (path) Fsdataoutputstream.write ( Buffer, 0, buffer.length); |
Create a folder |
File.mkdir () |
Hdfs.mkdirs (Path); |
Read the file |
New FileInputStream (); Fileinputstream.read (buffer) |
Fsdatainputstream = Hdfs.open (path); Fsdatainputstream.read (buffer); |
Write a file |
Fileoutputstream.write ( Buffer, 0, buffer.length); |
Fsdataoutputstream = hdfs.append (path) Fsdataoutputstream.write ( Buffer, 0, buffer.length); |
Delete file (clip) |
File.delete () |
Filesystem.delete (Path) |
List Folder Contents |
File.list (); |
Filesystem.liststatus () |
Re-command file (clip) |
File.renameto (File) |
Filesystem.rename (path, PATH) |
With this table, in the future when necessary to facilitate the query.
Hadoop Combat – Build the Eclipse development environment and write Hello World