There are many examples of Hadoop online, but it is not difficult to find that even a wordcount have a lot of different places, we can not always take other people's example run, so we have to summarize a set of specifications, so that the API even if the update can immediately adapt to come. We also use the Hadoop patent analysis as cannon fodder.
Right-click the new Map/reduce project, then tap the project right-mapper,reducer,mapperreduce Driver and fill in the Mapperreduce Driver the class name of the newly created Mapper,reducer, and modify
Mapperreduce driver Path is args[0],args[1], then run as to select Runconfiguration Click javaapplication Configuration arguments to:
Hdfs://master:9000/user/input/file1.txt
Hdfs://master:9000/user/aa
This kind of thing, such a set of specifications will be completed
Next, we will analyze the case of "patent analysis".
ImportJava.io.IOException;Importorg.apache.hadoop.io.LongWritable;ImportOrg.apache.hadoop.io.Text;ImportOrg.apache.hadoop.mapreduce.mapper;public class mapclass extends Mapper<longwritable, Text, Text, text> {public void Map (longwritable ikey, Text ivalue, Context context)throwsIOException, interruptedexception {string[] citation = ivalue.tostring (). Split (","); Context.write (NewText (citation[1]),NewText (citation[0])); }}
源文件类似这样专利号 引用专利号K1 , V1K2 , V2K3 , V3K1 , V3
Longwritable Ikey represents each line in the text
Ivalue represents the value in the text.
string[] citation = ivalue.tostring (). Split (",");
is to divide the text by a comma-delimited
Context.write (new text (Citation[1]), new text (citation[0]));
Contexts are a context in which the MapReduce task runs, containing all the information for the entire task
Context write: The key is a reference to the patent number, the value is a map of the patent number, the key is unique, so Hadoop will automatically place the value together, namely:
专利号 引用专利号V1 K1V2 K2V3 K3 k1
ImportJava.io.IOException;ImportOrg.apache.hadoop.io.Text;ImportOrg.apache.hadoop.mapreduce.reducer;public class Reduce extends Reducer<text, text, text , Text> {public void reduce (Text _key, iterable<text> values, context context)throwsIOException, Interruptedexception {//Process valuesString csv =""; for(TextVal: values) {if(Csv.length () >0) {csv + =","; } CSV + =Val. toString (); } context.write (_key,NewText (CSV)); }}
values:这里就是上面map分解后传给你的东西了 即 专利号 引用专利号 V3 K3 k1 ""; forvalues) { if (csv.length0) { ","; } csv += val.toString(); } context.write(_key, new Text(csv)); 这里就是在value上加上逗号方便观察了
Finally, this is also the auto-generated code ... Right-click on Hadoop Select the one you just configured. import org. Apache. Hadoop. conf. Configuration;import org. Apache. Hadoop. FS. Path;import org. Apache. Hadoop. IO. Text;import org. Apache. Hadoop. MapReduce. Job;import org. Apache. Hadoop. MapReduce. Lib. Input. Fileinputformat;import org. Apache. Hadoop. MapReduce. Lib. Output. Fileoutputformat;public class Driver {public static void main (string[] args) throws Exception {Configuration conf = new Configu Ration ();Job Job = Job. getinstance(Conf,"JobName");Job. Setjarbyclass(Driver. Class);Job. Setmapperclass(Mapclass. Class);Job. Setreducerclass(Reduce. Class);Todo:specify Output Types Job. Setoutputkeyclass(Text. Class);Job. Setoutputvalueclass(Text. Class);Todo:specify input andOutput directories (not files) Fileinputformat. Setinputpaths(Job, New Path (args[0]));Fileoutputformat. Setoutputpath(Job, New Path (args[1]));if (!job. WaitForCompletion(true)) Return;}}
Copyright NOTICE: This article for Bo Master original article, without Bo Master permission not reproduced.
Hadoop Programming Specification (Hadoop patent analysis)