Eclipse Configuration hadoop2.7.2 Development environment

Source: Internet
Author: User

First install and start Hadoop, how to see the above http://www.cnblogs.com/wuxun1997/p/6847950.html. Here's how to set up the IDE to develop Hadoop code. First make sure you install Eclipse locally, and then the next Eclipse Hadoop plugin is done. Here's a look at:

1. To http://download.csdn.net/detail/wuxun1997/9841487 download the Eclipse plugin and drop it into Eclipse's Pulgin directory and restart the Eclipse,project Explorer appears Dfs Locations;

2, click window-> Point preferences-> Point Hadoop map/reduce-> fill D:\hadoop-2.7.2 and OK;

3, click window-> Point Show view-> point MapReduce tools Map/reduce locations-> point to the right corner of a small elephant with + sign "New Hadoop location" Eclipse has filled out the default parameters, but the following parameters need to be modified, see the two configuration files Core-site.xml and Hdfs-site.xml above:

General->map/reduce (V2) master->port change to 9001

GENERAL->DSF Master->port Change to 9000

Advanced Paramters->dfs.datanode.data.dir to Ffile:/hadoop/data/dfs/datanode

Advanced Paramters->dfs.namenode.name.dir to File:/hadoop/data/dfs/namenode

4, click Finish in DFS locations right click on the left Triangle icon, appear hdsf folder, you can directly operate here HDSF, right click on the file icon select "Create new Dictionery" can be added, Right click on the folder icon again reflesh the new results appear, at this time in Localhost:50070->utilities->browse the file system can also see the new results;

5. New Hadoop project: File->new->project->map/reduce project->next-> Enter your own project name, such as Hadoop, and click Finish.

6, the code here shows the most common examples of participle, the statistics of Chinese novels and the names in descending order. To import a jar for participles, download the http://download.csdn.net/detail/wuxun1997/9841659 here. The project structure is as follows:

Hadoop

|--src

|--com.wulinfeng.hadoop.wordsplit

|--wordsplit.java

|--ikanalyzer.cfg.xml

|--myext.dic

|--mystopword.dic

Wordsplit.java

 PackageCom.wulinfeng.hadoop.wordsplit;Importjava.io.IOException;ImportJava.io.StringReader;Importorg.apache.hadoop.conf.Configuration;ImportOrg.apache.hadoop.fs.FileSystem;ImportOrg.apache.hadoop.fs.Path;Importorg.apache.hadoop.io.IntWritable;ImportOrg.apache.hadoop.io.Text;Importorg.apache.hadoop.io.WritableComparable;ImportOrg.apache.hadoop.mapreduce.Job;ImportOrg.apache.hadoop.mapreduce.Mapper;ImportOrg.apache.hadoop.mapreduce.Reducer;ImportOrg.apache.hadoop.mapreduce.lib.input.FileInputFormat;ImportOrg.apache.hadoop.mapreduce.lib.input.SequenceFileInputFormat;ImportOrg.apache.hadoop.mapreduce.lib.map.InverseMapper;ImportOrg.apache.hadoop.mapreduce.lib.output.FileOutputFormat;ImportOrg.apache.hadoop.mapreduce.lib.output.SequenceFileOutputFormat;ImportOrg.wltea.analyzer.core.IKSegmenter;ImportOrg.wltea.analyzer.core.Lexeme; Public classWordsplit {/*** Map Implementation participle *@authorAdministrator **/     Public Static classTokenizermapperextendsMapper<object, text, text, intwritable> {        Private Static FinalIntwritable one =NewIntwritable (1); PrivateText Word =NewText ();  Public voidMap (Object key, text value, Mapper<object, text, text, intwritable>. Context context)throwsIOException, interruptedexception {stringreader input=NewStringReader (value.tostring ()); Iksegmenter ikseg=NewIksegmenter (Input,true);//Intelligent Word Segmentation             for(Lexeme lexeme = Ikseg.next (); Lexeme! =NULL; Lexeme =Ikseg.next ()) {                 This. Word.set (Lexeme.getlexemetext ()); Context.write ( This. Word, one); }        }    }    /*** Reduce to achieve word segmentation cumulative *@authorAdministrator **/     Public Static classIntsumreducerextendsReducer<text, Intwritable, Text, intwritable> {        Privateintwritable result =Newintwritable ();  Public voidReduce (Text key, iterable<intwritable>values, Reducer<text, Intwritable, Text, intwritable>. Context context)throwsIOException, interruptedexception {intsum = 0;  for(intwritable val:values) {sum+=Val.get (); }             This. Result.set (sum); Context.write (Key, This. Result); }    }     Public Static voidMain (string[] args)throwsException {Configuration conf=NewConfiguration (); String Inputfile= "/input/people.txt";//input FilePath OutDir =NewPath ("/out");//Output DirectoryPath TempDir =NewPath ("/tmp" + system.currenttimemillis ());//Temp directory//First task: participleSYSTEM.OUT.PRINTLN ("Start Task ..."); Job Job= Job.getinstance (conf, "Word split"); Job.setjarbyclass (wordsplit.class); Job.setmapperclass (tokenizermapper.class); Job.setcombinerclass (intsumreducer.class); Job.setreducerclass (intsumreducer.class); Job.setoutputkeyclass (Text.class); Job.setoutputvalueclass (intwritable.class); Fileinputformat.addinputpath (Job,NewPath (inputfile));        Fileoutputformat.setoutputpath (Job, tempdir); //end of first task, output as input to second task, start sort taskJob.setoutputformatclass (Sequencefileoutputformat.class); if(Job.waitforcompletion (true) {System.out.println ("Start sort ..."); Job Sortjob= Job.getinstance (conf, "word sort"); Sortjob.setjarbyclass (wordsplit.class); Sortjob.setmapperclass (inversemapper.class); Sortjob.setinputformatclass (Sequencefileinputformat.class); //reverses the Map key value, calculates the word frequency and descendingSortjob.setmapoutputkeyclass (intwritable.class); Sortjob.setmapoutputvalueclass (Text.class); Sortjob.setsortcomparatorclass (intwritabledecreasingcomparator.class); Sortjob.setnumreducetasks (1); //output to out directory fileSortjob.setoutputkeyclass (intwritable.class); Sortjob.setoutputvalueclass (Text.class);            Fileinputformat.addinputpath (Sortjob, tempdir); //If you already have an out directory, delete and then createFileSystem FileSystem =outdir.getfilesystem (conf); if(Filesystem.exists (OutDir)) {Filesystem.delete (OutDir,true);            } fileoutputformat.setoutputpath (Sortjob, OutDir); if(Sortjob.waitforcompletion (true) {System.out.println ("Finish and quit ..."); //Delete Temp directoryFileSystem =tempdir.getfilesystem (conf); if(Filesystem.exists (tempdir)) {Filesystem.delete (TempDir,true); } system.exit (0); }        }    }    /*** Implemented in descending order * *@authorAdministrator **/    Private Static classIntwritabledecreasingcomparatorextendsIntwritable.comparator { Public intCompare (Writablecomparable A, writablecomparable b) {return-Super. Compare (A, b); }         Public intComparebyte[] B1,intS1,intL1,byte[] B2,intS2,intL2) {            return-Super. Compare (B1, S1, L1, B2, S2, L2); }    }}

IKAnalyzer.cfg.xml

<?XML version= "1.0" encoding= "UTF-8"?><!DOCTYPE Properties SYSTEM "Http://java.sun.com/dtd/properties.dtd"><Properties>    <Comment>IK Analyzer Extended Configuration</Comment>    <!--The user can configure their own extension dictionary here -    <entryKey= "Ext_dict">Myext.dic</entry>    <!--The user can configure their own extension stop word dictionary here -    <entryKey= "Ext_stopwords">Mystopword.dic</entry></Properties>

Myext.dic

Gao Yuliang Qitongwei Chen Haichen rock Houliangping Gao Xiaoqin Sharekin Success

Mystopword.dic

You, me, he is.

This is where you run the Wordsplit class directly in Eclipse, right-click Run as-run on Hadoop. Because in the class to write dead input files, so need to build a input directory in the D disk, put a file called People.txt's novel, is the online swing down the hot drama "People's name", in order to participle need to put people.txt to notepad++ open, dot code- Encoded in UTF-8 with no BOM format. In the Myext.dic input some don't want to split the name, in the Mystopword.dic input want to filter out some verbs and auxiliary words, run out to D:\out to see part-r-00000 file can know who is pig's foot.

Eclipse Configuration hadoop2.7.2 Development environment

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.