HDFS handwritten mapreduce Word counting framework

Source: Internet
Author: User

First, the Data processing class

 PackageCom.css.hdfs;ImportJava.io.BufferedReader;Importjava.io.IOException;ImportJava.io.InputStreamReader;ImportJava.net.URI;Importjava.net.URISyntaxException;ImportJava.util.HashMap;ImportJava.util.Map.Entry;Importjava.util.Properties;ImportJava.util.Set;Importorg.apache.hadoop.conf.Configuration;ImportOrg.apache.hadoop.fs.FSDataInputStream;ImportOrg.apache.hadoop.fs.FSDataOutputStream;ImportOrg.apache.hadoop.fs.FileSystem;ImportOrg.apache.hadoop.fs.LocatedFileStatus;ImportOrg.apache.hadoop.fs.Path;ImportOrg.apache.hadoop.fs.RemoteIterator;/*** Need: File (Hello world Hello teacher Hello John Tom) count the number of occurrences of each word? * Data stored in HDFS, statistical results stored in HDFS * * * * * * * * * * 2004google:dfs/bigtable/mapreduce * 1. Storage of MASSIVE data * HDFS * 2. Calculation of MASSIVE data * mapreduce * * idea? * Hello 2 * World 1 * Hello 1 * ... * * Based on user experience: * User input data * user-processed way * User specified result data storage location*/ Public classHdfswordcount { Public Static voidMain (string[] args)throwsIOException, ClassNotFoundException, Instantiationexception, Illegalaccessexception, InterruptedException, urisyntaxexception {//ReflectionProperties Pro =NewProperties (); //Load configuration filePro.load (Hdfswordcount.class. getClassLoader (). getResourceAsStream ("Job.properties")); Path Inpath=NewPath (Pro.getproperty ("In_path")); Path Outpath=NewPath (Pro.getproperty ("Out_path")); Class<?> Mapper_class = Class.forName (Pro.getproperty ("Mapper_class")); //instantiation ofMapper Mapper =(Mapper) mapper_class.newinstance (); Context Context=NewContext (); //Building HDFs Client ObjectsConfiguration conf =NewConfiguration (); FileSystem FS= Filesystem.get (NewURI ("Hdfs://192.168.146.132:9000/"), conf, "root"); //read user-entered FilesRemoteiterator<locatedfilestatus> iter = Fs.listfiles (Inpath,false);  while(Iter.hasnext ()) {Locatedfilestatus file=Iter.next (); //Open path get input streamFsdatainputstream in =Fs.open (File.getpath ()); BufferedReader BR=NewBufferedReader (NewInputStreamReader (In, "Utf-8")); String Line=NULL;  while(line = Br.readline ())! =NULL) {                //calling the map method to execute the business logicMapper.map (line, context); }            //Close ResourceBr.close ();        In.close (); }                //If the user enters a result path that does not exist, create aPath out =NewPath ("/wc/out/"); if(!fs.exists (out))        {fs.mkdirs (out); }                //Store the cached results in HDFsHashmap<object, object> contextmap =Context.getcontextmap (); Fsdataoutputstream OUT1=fs.create (Outpath); //Traverse HashMapSet<entry<object, object>> entryset =Contextmap.entryset ();  for(Entry<object, object>Entry:entryset) {            //Write DataOut1.write ((Entry.getkey (). toString () + "\ T" + entry.getvalue () + "\ n"). GetBytes ()); }        //Close ResourceOut1.close ();                Fs.close (); System.out.println ("Data statistics results ..."); }}

Second, the interface class

 Package Com.css.hdfs; /**  */Publicinterface  Mapper    {//  call method     Public void Map (String line, context context);}

Third, the data transmission class

 PackageCom.css.hdfs;ImportJava.util.HashMap;/*** Idea: * Data transmission class * Package Data * Collection * < word,1>*/ Public classContext {//Data Encapsulation    PrivateHashmap<object, object> contextmap =NewHashmap<>(); //Write Data     Public voidwrite (Object key, Object value) {//put the data in the mapcontextmap.put (key, value); }        //define how to get a value based on key     Publicobject get (Object key) {returnContextmap.get (key); }        //get the data content in the map     PublicHashmap<object, object>Getcontextmap () {returnContextmap; }}

Four, the word Count class

 PackageCom.css.hdfs;/*** Idea: * Add a map method word to slice the same key value + +*/ Public classWordcountmapperImplementsmapper{@Override Public voidMap (String line, context context) {//get this slice of data.string[] Words = Line.split (""); //get the word same key value++ Hello 1 World 1         for(String word:words) {Object value=context.get (word); if(NULL==value) {Context.write (Word,1); }Else {                //is not empty                intv = (int) value; Context.write (Word, v+1); }        }    }}

V. Configuration file Job.properties

in_path=/wc/inout_path=/wc/out/rs.txtmapper_class=com.css.hdfs.wordcountmapper

HDFS handwritten mapreduce Word counting framework

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.