There is a problem when doing the project, when processing the target data in the mapper and reducer methods, the first thing to do is to retrieve and match an existing tag library, then label the fields that are processed. Because the tag library is not very large, there is no need to use hbase. My implementation is to store the tag library as a file on HDFs, which is stored in a distributed cache so that each slave can read the file.
The configuration in the Main method:
// The file path to be stored by the distributed cache String cachepath[] = " HD Fs://10.105.32.57:8020/user/ad-data/tag/tag-set.csv " " hdfs://10.105.32 .57:8020/user/ad-data/tag/tagedurl.csv " }; // Job.addcachefile (new Path ( Cachepath[0 ]). Touri ()); Job.addcachefile ( new Path (cachepath[1 ]). Touri ());
You can add files to the distributed cache by referencing the above code.
Read the distributed cache file in the Mapper and reducer methods:
/** Rewrite the mapper setup method to get the files in the distributed cache*/@Overrideprotected voidSetup (mapper<longwritable, text, text, text>. Context context) throws IOException, interruptedexception {//TODO auto-generated Method StubSuper.setup (context); Uri[] Cachefile=Context.getcachefiles (); Path Tagsetpath=NewPath (cachefile[0]); Path Tagedurlpath=NewPath (cachefile[1]); File operations (such as reading the contents into set or map); } @Override Public voidmap (longwritable key, Text value, Context context) throws IOException, Interruptedexception {in Using the data read out in map (); }
Similarly, if you want to read the distributed cache file in reducer, the example is as follows:
/** Rewrite the reducer setup method to get the files in the distributed cache*/@Overrideprotected voidSetup (Context context) throws IOException, Interruptedexception {super.setup (context); MoS=NewMultipleoutputs<text, text>(context); Uri[] Cachefile=Context.getcachefiles (); Path Tagsetpath=NewPath (cachefile[0]); Path Tagsetpath=NewPath (cachefile[1]); File read operation; } @Override Public voidReduce (Text key, iterable<text>values, Context context) throws IOException, interruptedexception { while(Values.iterator (). Hasnext ()) {Use read-out data; } context.write (Key,NewText (sb.tostring ())); }
Use of the Hadoop Distributedcache distributed cache