Wordcount. JavaHadoop's distributed cache mechanism allows all map or reduce tasks of a job to access the same file. After the task is submitted, hadoop copies the Files specified by-files and-archive options to HDFS (jobtracker's file system ). Before running a task, tasktracker Copies files from the jobtracker file system to a local disk as a cache so that the task can access these files. For a job, it does not care about where the file comes from. When distributedcache is used, symbolic link is usually used for access to localized files, which is more convenient. PassUri HDFS: // namenode/test/input/file1 # myfileThe specified file is signed as myfile in the current working directory. In this way, the file can be accessed directly through myfile in the job, without worrying about the specific local path of the file.
Example:
Package Org. myorg;
ImportJava. Io. bufferedreader;
ImportJava. Io. filereader;
ImportJava. Io. ioexception;
ImportJava.net. Uri;
ImportJava. util. stringtokenizer;
ImportJava. Io. ioexception;
ImportJava. util .*;
ImportOrg. Apache. hadoop. filecache. distributedcache;
ImportOrg. Apache. hadoop. fs. path;
ImportOrg. Apache. hadoop. conf .*;
ImportOrg. Apache. hadoop. Io .*;
ImportOrg. Apache. hadoop. mapred .*;
ImportOrg. Apache. hadoop. util .*;
Public Class Wordcount
{
Public Static Void Usedistributedcachebysymboliclink () Throws Exception
{
Filereader reader = New Filereader ("god.txt ");
Bufferedreader BR = New Bufferedreader (Reader );
String S1 = Null ;
While (S1 = Br. Readline ())! = Null )
{
System. Out. println (S1 );
}
BR. Close ();
Reader. Close ();
}
Public Static ClassMapExtendsMapreducebaseImplementsMapper <longwritable, text, text, intwritable>
{
Public Void Configure (jobconf job)
{
System. Out. println ("now, use the distributed cache and syslink ");
Try {
Usedistributedcachebysymboliclink ();
}
Catch (Exception E)
{
E. printstacktrace ();
}
}
Private Final StaticIntwritable one =NewIntwritable (1 );
PrivateText word =NewText ();
Public Void Map (longwritable key, text value, outputcollector <text, intwritable> output, Reporter) Throws Ioexception
{
String line = value. tostring ();
Stringtokenizer tokenizer = New Stringtokenizer (line );
While (Tokenizer. hasmoretokens ())
{
Word. Set (tokenizer. nexttoken ());
Output. Collect (word, one );
}
}
}
Public Static Class Reduce Extends Mapreducebase Implements CER <text, intwritable, text, intwritable>
{
Public Void Reduce (Text key, iterator <intwritable> values, outputcollector <text, intwritable> output, reporter) Throws Ioexception
{
Int Sum = 0;
While (Values. hasnext ())
{
Sum + = values. Next (). Get ();
}
Output. Collect (key, New Intwritable (SUM ));
}
}
Public Static VoidMain (string [] ARGs)ThrowsException
{
Jobconf conf =NewJobconf (wordcount.Class);
Conf. setjobname ("wordcount ");
Conf. setoutputkeyclass (text.Class);
Conf. setoutputvalueclass (intwritable.Class);
Conf. setmapperclass (map.Class);
Conf. setcombinerclass (reduce.Class);
Conf. setreducerclass (reduce.Class);
Conf. setinputformat (textinputformat.Class);
Conf. setoutputformat (textoutputformat.Class);
Fileinputformat. setinputpaths (Conf,NewPATH (ARGs [0]);
Fileoutputformat. setoutputpath (Conf,NewPATH (ARGs [1]);
Distributedcache. createsymlink (CONF );
String Path = "/xuxm_dev_test_61_pic/In/wordcount. Java ";
Path filepath =NewPATH (PATH );
String uriwithlink = filepath. touri (). tostring () + "#" + "god.txt ";
Distributedcache. addcachefile (NewUri (uriwithlink), conf );
Jobclient. runjob (CONF );
}
}
Execution method reference http://hadoop.apache.org/common/docs/r0.19.2/cn/mapred_tutorial.html#%E4%BE%8B%E5%AD%90%EF%BC%9AWordCount+v1.0
ProgramThe running result is displayed in the log of the task in jobtracker./Xuxm_dev_test_61_pic/In/wordcount. JavaFile Content.
If the program uses many small files, It is very convenient to use Symbolic Link.
Put the wordcount. Java file to the specified location before execution. Otherwise, the file will not be found.