Distributedcache 2 in hadoop

Source: Internet
Author: User

Wordcount. JavaHadoop's distributed cache mechanism allows all map or reduce tasks of a job to access the same file. After the task is submitted, hadoop copies the Files specified by-files and-archive options to HDFS (jobtracker's file system ). Before running a task, tasktracker Copies files from the jobtracker file system to a local disk as a cache so that the task can access these files. For a job, it does not care about where the file comes from. When distributedcache is used, symbolic link is usually used for access to localized files, which is more convenient. PassUri HDFS: // namenode/test/input/file1 # myfileThe specified file is signed as myfile in the current working directory. In this way, the file can be accessed directly through myfile in the job, without worrying about the specific local path of the file.

Example:

 

Package Org. myorg;

ImportJava. Io. bufferedreader;
ImportJava. Io. filereader;
ImportJava. Io. ioexception;
ImportJava.net. Uri;
ImportJava. util. stringtokenizer;

ImportJava. Io. ioexception;
ImportJava. util .*;

ImportOrg. Apache. hadoop. filecache. distributedcache;
ImportOrg. Apache. hadoop. fs. path;
ImportOrg. Apache. hadoop. conf .*;
ImportOrg. Apache. hadoop. Io .*;
ImportOrg. Apache. hadoop. mapred .*;
ImportOrg. Apache. hadoop. util .*;

Public Class Wordcount
{
Public Static Void Usedistributedcachebysymboliclink () Throws Exception
{
Filereader reader = New Filereader ("god.txt ");
Bufferedreader BR = New Bufferedreader (Reader );
String S1 = Null ;
While (S1 = Br. Readline ())! = Null )
{
System. Out. println (S1 );
}
BR. Close ();
Reader. Close ();
}

Public Static ClassMapExtendsMapreducebaseImplementsMapper <longwritable, text, text, intwritable>
{

Public Void Configure (jobconf job)
{
System. Out. println ("now, use the distributed cache and syslink ");
Try {
Usedistributedcachebysymboliclink ();
}
Catch (Exception E)
{
E. printstacktrace ();
}

}

Private Final StaticIntwritable one =NewIntwritable (1 );
PrivateText word =NewText ();

Public Void Map (longwritable key, text value, outputcollector <text, intwritable> output, Reporter) Throws Ioexception
{
String line = value. tostring ();
Stringtokenizer tokenizer = New Stringtokenizer (line );
While (Tokenizer. hasmoretokens ())
{
Word. Set (tokenizer. nexttoken ());
Output. Collect (word, one );
}
}
}

Public Static Class Reduce Extends Mapreducebase Implements CER <text, intwritable, text, intwritable>
{
Public Void Reduce (Text key, iterator <intwritable> values, outputcollector <text, intwritable> output, reporter) Throws Ioexception
{
Int Sum = 0;
While (Values. hasnext ())
{
Sum + = values. Next (). Get ();
}
Output. Collect (key, New Intwritable (SUM ));
}
}

Public Static VoidMain (string [] ARGs)ThrowsException
{
Jobconf conf =NewJobconf (wordcount.Class);
Conf. setjobname ("wordcount ");

Conf. setoutputkeyclass (text.Class);
Conf. setoutputvalueclass (intwritable.Class);

Conf. setmapperclass (map.Class);
Conf. setcombinerclass (reduce.Class);
Conf. setreducerclass (reduce.Class);

Conf. setinputformat (textinputformat.Class);
Conf. setoutputformat (textoutputformat.Class);

Fileinputformat. setinputpaths (Conf,NewPATH (ARGs [0]);
Fileoutputformat. setoutputpath (Conf,NewPATH (ARGs [1]);

Distributedcache. createsymlink (CONF );
String Path = "/xuxm_dev_test_61_pic/In/wordcount. Java ";
Path filepath =NewPATH (PATH );
String uriwithlink = filepath. touri (). tostring () + "#" + "god.txt ";
Distributedcache. addcachefile (NewUri (uriwithlink), conf );

Jobclient. runjob (CONF );
}

}

Execution method reference http://hadoop.apache.org/common/docs/r0.19.2/cn/mapred_tutorial.html#%E4%BE%8B%E5%AD%90%EF%BC%9AWordCount+v1.0

 

ProgramThe running result is displayed in the log of the task in jobtracker./Xuxm_dev_test_61_pic/In/wordcount. JavaFile Content.

 

If the program uses many small files, It is very convenient to use Symbolic Link.

Put the wordcount. Java file to the specified location before execution. Otherwise, the file will not be found.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.