Mac under Hadoop run Word Count's pit
Word Count embodies the classic idea of map reduce, which is the Hello World in distributed computing. However, the blogger was fortunate to have encountered a problem peculiar to the Mac Mkdirs failed to create, hereby recorded
One, the code
- Wcmapper.java
Package WordCount;import org.apache.hadoop.io.LongWritable;import Org.apache.hadoop.io.Text;import Org.apache.hadoop.mapreduce.Mapper;import org.apache.hadoop.util.StringUtils;import java.io.IOException;/*** Four generics, the first two refer to the Mapper input data type* Keyin is the input key type, Valuein is the input value type* The data input and output of map and reduce are key-value in the form of a pair.* By default, the framework passes the input data to our mapper* Key is the starting offset of the first line in the text to be processed, and value is the contents of this line ** Long->longwritable realizes Hadoop's own serialization interface, with more streamlined content and high transmission efficiency* String->text */ Public classWcmapperextendsmapper<longwritable, text, text, longwritable>{//mapreduce frame Each row of data is called once the modification method @Override protected void Map(longwritable key, Text value, context context)throwsIOException, Interruptedexception {//The specific business logic is written in this method, and the required processing of the Key-value has been passed in //Convert the contents of this line into a stringString line = value.toString();//Slice wordsstring[] Words = stringutils.Split(line, ');//Outputs the result via context for(String word:words) {context.Write(NewText (Word),New longwritable(1)); } }}
- Wcreducer.java
Package WordCount;import org.apache.hadoop.io.LongWritable;import Org.apache.hadoop.io.Text;import Org.apache.hadoop.mapreduce.Reducer;import java.io.IOException; Public classWcreducerextendsReducer<text, Longwritable, Text, longwritable>{//The framework caches all k-v pairs after the map processing is complete. //group, then pass a group <key, values{}> //Call the reduce method once @Override protected void Reduce(Text key, iterable<longwritable> values, context context)throwsIOException, Interruptedexception {LongCount =0;//Traverse values, accumulate sum for(Longwritable value:values) {count + = value.Get(); }//Output The statistical results of this wordContext.Write(Key,New longwritable(count)); }}
- Wcrunner.java (Startup Item)
Package WordCount;import org.apache.hadoop.conf.Configuration;import Org.apache.hadoop.fs.Path;import org.apache.hadoop.io.LongWritable;import Org.apache.hadoop.io.Text;import Org.apache.hadoop.mapreduce.lib.input.FileInputFormat;import Org.apache.hadoop.mapreduce.Job;import Org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;import java.io.IOException;/*** Used to describe a specific job* For example, which class the job uses as a map for logical processing, and which is the reduce* You can also specify the path to the data that the job needs* You can also specify which path the output of the job will be put to */ Public classWcrunner { Public Static void Main(string[] args)throwsIOException, ClassNotFoundException, interruptedexception {Configuration conf =NewConfiguration (); Job Job = job.getinstance(conf);//Set the jar package required for the entire job //through Wcruner to find other dependent wcmapper and WcreducerJob.Setjarbyclass(Wcrunner.class);//Mapper and reducer classes used by this jobJob.Setmapperclass(Wcmapper.class); Job.Setreducerclass(Wcreducer.class);//Specify the output kv type of the reducerJob.Setoutputkeyclass(Text.class); Job.Setoutputvalueclass(longwritable.class);//Specify the output kv type of the mapperJob.Setmapoutputkeyclass(Text.class); Job.Setmapoutputvalueclass(longwritable.class);//Specify where the original data is storedFileinputformat.setinputpaths(Job,New Path("/wc/input/"));//Specify where the output data for processing results is storedFileoutputformat.Setoutputpath(Job,New Path("/wc/output/"));//To run the job submissionJob.waitforcompletion(true); }}
Second, the problem recurs
After the code is written and packaged into a jar, the blogger is graphically manipulated with idea and then submitted to Hadoop to run
hadoop jar hadoopStudy.jar wordcount.WCRunner
Results did not appear as the official website and many other tutorials that the results, but error
Exception"main" java.io.IOException: Mkdirs failed to create /var/folders/vf/rplr8k812fj018q5lxcb5k940000gn/T/hadoop-unjar1598612687383099338/META-INF/license at org.apache.hadoop.util.RunJar.ensureDirectory(RunJar.java:146) at org.apache.hadoop.util.RunJar.unJar(RunJar.java:119) at org.apache.hadoop.util.RunJar.unJar(RunJar.java:94) at org.apache.hadoop.util.RunJar.run(RunJar.java:227) at org.apache.hadoop.util.RunJar.main(RunJar.java:153)
Last toss for half a day, found to be a Mac problem, found in StackOverflow explanation
The issue is that a/tmp/hadoop-xxx/xxx/license file and a
/tmp/hadoop-xxx/xxx/license directory is being created on a
Case-insensitive file system when unjarring the mahout jobs.
Delete the original compressed package Meta-inf/licens, and then re-compress, solve the problem ~
zip -d hadoopStudy.jar META-INF/LICENSEjar|grep LICENSE
Then upload the new jar to Hadoop and run it.
hadoop jar hadoopStudy.jar wordcount.WCRunner
Bingo!
Iii. Results of operation
By the Way, look at the results of the operation
- Input file
wc/input/input.txt
- Output file
/wc/output/part-r-00000]
The running result is obviously correct, no longer dare to say Mac Dafa well ...
Mac under Hadoop run Word Count's pit