First install and start Hadoop, how to see the above http://www.cnblogs.com/wuxun1997/p/6847950.html. Here's how to set up the IDE to develop Hadoop code. First make sure you install Eclipse locally, and then the next Eclipse Hadoop plugin is done. Here's a look at:
1. To http://download.csdn.net/detail/wuxun1997/9841487 download the Eclipse plugin and drop it into Eclipse's Pulgin directory and restart the Eclipse,project Explorer appears Dfs Locations;
2, click window-> Point preferences-> Point Hadoop map/reduce-> fill D:\hadoop-2.7.2 and OK;
3, click window-> Point Show view-> point MapReduce tools Map/reduce locations-> point to the right corner of a small elephant with + sign "New Hadoop location" Eclipse has filled out the default parameters, but the following parameters need to be modified, see the two configuration files Core-site.xml and Hdfs-site.xml above:
General->map/reduce (V2) master->port change to 9001
GENERAL->DSF Master->port Change to 9000
Advanced Paramters->dfs.datanode.data.dir to Ffile:/hadoop/data/dfs/datanode
Advanced Paramters->dfs.namenode.name.dir to File:/hadoop/data/dfs/namenode
4, click Finish in DFS locations right click on the left Triangle icon, appear hdsf folder, you can directly operate here HDSF, right click on the file icon select "Create new Dictionery" can be added, Right click on the folder icon again reflesh the new results appear, at this time in Localhost:50070->utilities->browse the file system can also see the new results;
5. New Hadoop project: File->new->project->map/reduce project->next-> Enter your own project name, such as Hadoop, and click Finish.
6, the code here shows the most common examples of participle, the statistics of Chinese novels and the names in descending order. To import a jar for participles, download the http://download.csdn.net/detail/wuxun1997/9841659 here. The project structure is as follows:
Hadoop
|--src
|--com.wulinfeng.hadoop.wordsplit
|--wordsplit.java
|--ikanalyzer.cfg.xml
|--myext.dic
|--mystopword.dic
Wordsplit.java
PackageCom.wulinfeng.hadoop.wordsplit;Importjava.io.IOException;ImportJava.io.StringReader;Importorg.apache.hadoop.conf.Configuration;ImportOrg.apache.hadoop.fs.FileSystem;ImportOrg.apache.hadoop.fs.Path;Importorg.apache.hadoop.io.IntWritable;ImportOrg.apache.hadoop.io.Text;Importorg.apache.hadoop.io.WritableComparable;ImportOrg.apache.hadoop.mapreduce.Job;ImportOrg.apache.hadoop.mapreduce.Mapper;ImportOrg.apache.hadoop.mapreduce.Reducer;ImportOrg.apache.hadoop.mapreduce.lib.input.FileInputFormat;ImportOrg.apache.hadoop.mapreduce.lib.input.SequenceFileInputFormat;ImportOrg.apache.hadoop.mapreduce.lib.map.InverseMapper;ImportOrg.apache.hadoop.mapreduce.lib.output.FileOutputFormat;ImportOrg.apache.hadoop.mapreduce.lib.output.SequenceFileOutputFormat;ImportOrg.wltea.analyzer.core.IKSegmenter;ImportOrg.wltea.analyzer.core.Lexeme; Public classWordsplit {/*** Map Implementation participle *@authorAdministrator **/ Public Static classTokenizermapperextendsMapper<object, text, text, intwritable> { Private Static FinalIntwritable one =NewIntwritable (1); PrivateText Word =NewText (); Public voidMap (Object key, text value, Mapper<object, text, text, intwritable>. Context context)throwsIOException, interruptedexception {stringreader input=NewStringReader (value.tostring ()); Iksegmenter ikseg=NewIksegmenter (Input,true);//Intelligent Word Segmentation for(Lexeme lexeme = Ikseg.next (); Lexeme! =NULL; Lexeme =Ikseg.next ()) { This. Word.set (Lexeme.getlexemetext ()); Context.write ( This. Word, one); } } } /*** Reduce to achieve word segmentation cumulative *@authorAdministrator **/ Public Static classIntsumreducerextendsReducer<text, Intwritable, Text, intwritable> { Privateintwritable result =Newintwritable (); Public voidReduce (Text key, iterable<intwritable>values, Reducer<text, Intwritable, Text, intwritable>. Context context)throwsIOException, interruptedexception {intsum = 0; for(intwritable val:values) {sum+=Val.get (); } This. Result.set (sum); Context.write (Key, This. Result); } } Public Static voidMain (string[] args)throwsException {Configuration conf=NewConfiguration (); String Inputfile= "/input/people.txt";//input FilePath OutDir =NewPath ("/out");//Output DirectoryPath TempDir =NewPath ("/tmp" + system.currenttimemillis ());//Temp directory//First task: participleSYSTEM.OUT.PRINTLN ("Start Task ..."); Job Job= Job.getinstance (conf, "Word split"); Job.setjarbyclass (wordsplit.class); Job.setmapperclass (tokenizermapper.class); Job.setcombinerclass (intsumreducer.class); Job.setreducerclass (intsumreducer.class); Job.setoutputkeyclass (Text.class); Job.setoutputvalueclass (intwritable.class); Fileinputformat.addinputpath (Job,NewPath (inputfile)); Fileoutputformat.setoutputpath (Job, tempdir); //end of first task, output as input to second task, start sort taskJob.setoutputformatclass (Sequencefileoutputformat.class); if(Job.waitforcompletion (true) {System.out.println ("Start sort ..."); Job Sortjob= Job.getinstance (conf, "word sort"); Sortjob.setjarbyclass (wordsplit.class); Sortjob.setmapperclass (inversemapper.class); Sortjob.setinputformatclass (Sequencefileinputformat.class); //reverses the Map key value, calculates the word frequency and descendingSortjob.setmapoutputkeyclass (intwritable.class); Sortjob.setmapoutputvalueclass (Text.class); Sortjob.setsortcomparatorclass (intwritabledecreasingcomparator.class); Sortjob.setnumreducetasks (1); //output to out directory fileSortjob.setoutputkeyclass (intwritable.class); Sortjob.setoutputvalueclass (Text.class); Fileinputformat.addinputpath (Sortjob, tempdir); //If you already have an out directory, delete and then createFileSystem FileSystem =outdir.getfilesystem (conf); if(Filesystem.exists (OutDir)) {Filesystem.delete (OutDir,true); } fileoutputformat.setoutputpath (Sortjob, OutDir); if(Sortjob.waitforcompletion (true) {System.out.println ("Finish and quit ..."); //Delete Temp directoryFileSystem =tempdir.getfilesystem (conf); if(Filesystem.exists (tempdir)) {Filesystem.delete (TempDir,true); } system.exit (0); } } } /*** Implemented in descending order * *@authorAdministrator **/ Private Static classIntwritabledecreasingcomparatorextendsIntwritable.comparator { Public intCompare (Writablecomparable A, writablecomparable b) {return-Super. Compare (A, b); } Public intComparebyte[] B1,intS1,intL1,byte[] B2,intS2,intL2) { return-Super. Compare (B1, S1, L1, B2, S2, L2); } }}
IKAnalyzer.cfg.xml
<?XML version= "1.0" encoding= "UTF-8"?><!DOCTYPE Properties SYSTEM "Http://java.sun.com/dtd/properties.dtd"><Properties> <Comment>IK Analyzer Extended Configuration</Comment> <!--The user can configure their own extension dictionary here - <entryKey= "Ext_dict">Myext.dic</entry> <!--The user can configure their own extension stop word dictionary here - <entryKey= "Ext_stopwords">Mystopword.dic</entry></Properties>
Myext.dic
Gao Yuliang Qitongwei Chen Haichen rock Houliangping Gao Xiaoqin Sharekin Success
Mystopword.dic
You, me, he is.
This is where you run the Wordsplit class directly in Eclipse, right-click Run as-run on Hadoop. Because in the class to write dead input files, so need to build a input directory in the D disk, put a file called People.txt's novel, is the online swing down the hot drama "People's name", in order to participle need to put people.txt to notepad++ open, dot code- Encoded in UTF-8 with no BOM format. In the Myext.dic input some don't want to split the name, in the Mystopword.dic input want to filter out some verbs and auxiliary words, run out to D:\out to see part-r-00000 file can know who is pig's foot.
Eclipse Configuration hadoop2.7.2 Development environment