Use of the Hadoop Distributedcache distributed cache

Source: Internet
Author: User

reprint Please specify: http://www.cnblogs.com/demievil/p/4059141.html

There is a problem when doing the project, when processing the target data in the mapper and reducer methods, the first thing to do is to retrieve and match an existing tag library, then label the fields that are processed. Because the tag library is not very large, there is no need to use hbase. My implementation is to store the tag library as a file on HDFs, which is stored in a distributed cache so that each slave can read the file.

The configuration in the Main method:

// the file path to be stored by the distributed cache String cachepath[] = {                "hdfs://10.105.32.57:8020/user/ad-data/tag/tag-set.csv",                "hdfs:/ /10.105.32.57:8020/user/ad-data/tag/tagedurl.csv "        }; // adding files        to the distributed cache Job.addcachefile (new Path (cachepath[0]). Touri ());        Job.addcachefile (new Path (Cachepath[1]). Touri ());

You can add files to the distributed cache by referencing the above code.

Read the distributed cache file in the Mapper and reducer methods:

/** Rewrite the mapper setup method to get the files in the distributed cache*/@Overrideprotected voidSetup (mapper<longwritable, text, text, text>. Context context)throwsIOException, interruptedexception {//TODO auto-generated Method Stub        Super. Setup (context); Uri[] Cachefile=Context.getcachefiles (); Path Tagsetpath=NewPath (cachefile[0]); Path Tagedurlpath=NewPath (cachefile[1]);    File operations (such as reading the contents into set or map); } @Override Public voidmap (longwritable key, Text value, context context)throwsIOException, interruptedexception {Use the read-out data in map (); }

Similarly, if you want to read the distributed cache file in reducer, the example is as follows:

/** Rewrite the reducer setup method to get the files in the distributed cache*/@Overrideprotected voidSetup (context context)throwsIOException, interruptedexception {Super. Setup (context); MoS=NewMultipleoutputs<text, text>(context); Uri[] Cachefile=Context.getcachefiles (); Path Tagsetpath=NewPath (cachefile[0]); Path Tagsetpath=NewPath (cachefile[1]);    File read operation; } @Override Public voidReduce (Text key, iterable<text>values, context context)throwsIOException, interruptedexception { while(Values.iterator (). Hasnext ()) {Use read-out data; } context.write (Key,NewText (sb.tostring ())); }

Above.

Use of the Hadoop Distributedcache distributed cache

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.