Mac under Hadoop run Word Count's pit

Source: Internet
Author: User

Mac under Hadoop run Word Count's pit

Word Count embodies the classic idea of map reduce, which is the Hello World in distributed computing. However, the blogger was fortunate to have encountered a problem peculiar to the Mac Mkdirs failed to create, hereby recorded

One, the code
    1. Wcmapper.java
Package WordCount;import org.apache.hadoop.io.LongWritable;import Org.apache.hadoop.io.Text;import Org.apache.hadoop.mapreduce.Mapper;import org.apache.hadoop.util.StringUtils;import java.io.IOException;/*** Four generics, the first two refer to the Mapper input data type* Keyin is the input key type, Valuein is the input value type* The data input and output of map and reduce are key-value in the form of a pair.* By default, the framework passes the input data to our mapper* Key is the starting offset of the first line in the text to be processed, and value is the contents of this line ** Long->longwritable realizes Hadoop's own serialization interface, with more streamlined content and high transmission efficiency* String->text */ Public classWcmapperextendsmapper<longwritable, text, text, longwritable>{//mapreduce frame Each row of data is called once the modification method    @Override    protected void Map(longwritable key, Text value, context context)throwsIOException, Interruptedexception {//The specific business logic is written in this method, and the required processing of the Key-value has been passed in        //Convert the contents of this line into a stringString line = value.toString();//Slice wordsstring[] Words = stringutils.Split(line, ');//Outputs the result via context         for(String word:words) {context.Write(NewText (Word),New longwritable(1)); }    }}
    1. Wcreducer.java
Package WordCount;import org.apache.hadoop.io.LongWritable;import Org.apache.hadoop.io.Text;import Org.apache.hadoop.mapreduce.Reducer;import java.io.IOException; Public classWcreducerextendsReducer<text, Longwritable, Text, longwritable>{//The framework caches all k-v pairs after the map processing is complete.    //group, then pass a group <key, values{}>    //Call the reduce method once    @Override    protected void Reduce(Text key, iterable<longwritable> values, context context)throwsIOException, Interruptedexception {LongCount =0;//Traverse values, accumulate sum         for(Longwritable value:values) {count + = value.Get(); }//Output The statistical results of this wordContext.Write(Key,New longwritable(count)); }}
    1. Wcrunner.java (Startup Item)
Package WordCount;import org.apache.hadoop.conf.Configuration;import Org.apache.hadoop.fs.Path;import org.apache.hadoop.io.LongWritable;import Org.apache.hadoop.io.Text;import Org.apache.hadoop.mapreduce.lib.input.FileInputFormat;import Org.apache.hadoop.mapreduce.Job;import Org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;import java.io.IOException;/*** Used to describe a specific job* For example, which class the job uses as a map for logical processing, and which is the reduce* You can also specify the path to the data that the job needs* You can also specify which path the output of the job will be put to */ Public classWcrunner { Public Static void Main(string[] args)throwsIOException, ClassNotFoundException, interruptedexception {Configuration conf =NewConfiguration (); Job Job = job.getinstance(conf);//Set the jar package required for the entire job        //through Wcruner to find other dependent wcmapper and WcreducerJob.Setjarbyclass(Wcrunner.class);//Mapper and reducer classes used by this jobJob.Setmapperclass(Wcmapper.class); Job.Setreducerclass(Wcreducer.class);//Specify the output kv type of the reducerJob.Setoutputkeyclass(Text.class); Job.Setoutputvalueclass(longwritable.class);//Specify the output kv type of the mapperJob.Setmapoutputkeyclass(Text.class); Job.Setmapoutputvalueclass(longwritable.class);//Specify where the original data is storedFileinputformat.setinputpaths(Job,New Path("/wc/input/"));//Specify where the output data for processing results is storedFileoutputformat.Setoutputpath(Job,New Path("/wc/output/"));//To run the job submissionJob.waitforcompletion(true); }}
Second, the problem recurs

After the code is written and packaged into a jar, the blogger is graphically manipulated with idea and then submitted to Hadoop to run

hadoop jar hadoopStudy.jar wordcount.WCRunner

Results did not appear as the official website and many other tutorials that the results, but error

Exception"main" java.io.IOException: Mkdirs failed to create /var/folders/vf/rplr8k812fj018q5lxcb5k940000gn/T/hadoop-unjar1598612687383099338/META-INF/license    at org.apache.hadoop.util.RunJar.ensureDirectory(RunJar.java:146)    at org.apache.hadoop.util.RunJar.unJar(RunJar.java:119)    at org.apache.hadoop.util.RunJar.unJar(RunJar.java:94)    at org.apache.hadoop.util.RunJar.run(RunJar.java:227)    at org.apache.hadoop.util.RunJar.main(RunJar.java:153)

Last toss for half a day, found to be a Mac problem, found in StackOverflow explanation

The issue is that a/tmp/hadoop-xxx/xxx/license file and a
/tmp/hadoop-xxx/xxx/license directory is being created on a
Case-insensitive file system when unjarring the mahout jobs.

Delete the original compressed package Meta-inf/licens, and then re-compress, solve the problem ~

zip -d hadoopStudy.jar META-INF/LICENSEjar|grep LICENSE

Then upload the new jar to Hadoop and run it.

hadoop jar hadoopStudy.jar wordcount.WCRunner

Bingo!

Iii. Results of operation

By the Way, look at the results of the operation

    • Input filewc/input/input.txt

    • Output file/wc/output/part-r-00000]

The running result is obviously correct, no longer dare to say Mac Dafa well ...

Mac under Hadoop run Word Count's pit

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.