Mac under Hadoop run Word Count's pit

Last Update:2018-03-06 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Word Count embodies the classic idea of map reduce, which is the Hello World in distributed computing. However, the blogger was fortunate to have encountered a problem peculiar to the Mac Mkdirs failed to create, hereby recorded

One, the code

Wcmapper.java

Package WordCount;import org.apache.hadoop.io.LongWritable;import Org.apache.hadoop.io.Text;import Org.apache.hadoop.mapreduce.Mapper;import org.apache.hadoop.util.StringUtils;import java.io.IOException;/*** Four generics, the first two refer to the Mapper input data type* Keyin is the input key type, Valuein is the input value type* The data input and output of map and reduce are key-value in the form of a pair.* By default, the framework passes the input data to our mapper* Key is the starting offset of the first line in the text to be processed, and value is the contents of this line ** Long->longwritable realizes Hadoop's own serialization interface, with more streamlined content and high transmission efficiency* String->text */ Public classWcmapperextendsmapper<longwritable, text, text, longwritable>{//mapreduce frame Each row of data is called once the modification method    @Override    protected void Map(longwritable key, Text value, context context)throwsIOException, Interruptedexception {//The specific business logic is written in this method, and the required processing of the Key-value has been passed in        //Convert the contents of this line into a stringString line = value.toString();//Slice wordsstring[] Words = stringutils.Split(line, ');//Outputs the result via context         for(String word:words) {context.Write(NewText (Word),New longwritable(1)); }    }}

Wcreducer.java

Package WordCount;import org.apache.hadoop.io.LongWritable;import Org.apache.hadoop.io.Text;import Org.apache.hadoop.mapreduce.Reducer;import java.io.IOException; Public classWcreducerextendsReducer<text, Longwritable, Text, longwritable>{//The framework caches all k-v pairs after the map processing is complete.    //group, then pass a group <key, values{}>    //Call the reduce method once    @Override    protected void Reduce(Text key, iterable<longwritable> values, context context)throwsIOException, Interruptedexception {LongCount =0;//Traverse values, accumulate sum         for(Longwritable value:values) {count + = value.Get(); }//Output The statistical results of this wordContext.Write(Key,New longwritable(count)); }}

Wcrunner.java (Startup Item)

Package WordCount;import org.apache.hadoop.conf.Configuration;import Org.apache.hadoop.fs.Path;import org.apache.hadoop.io.LongWritable;import Org.apache.hadoop.io.Text;import Org.apache.hadoop.mapreduce.lib.input.FileInputFormat;import Org.apache.hadoop.mapreduce.Job;import Org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;import java.io.IOException;/*** Used to describe a specific job* For example, which class the job uses as a map for logical processing, and which is the reduce* You can also specify the path to the data that the job needs* You can also specify which path the output of the job will be put to */ Public classWcrunner { Public Static void Main(string[] args)throwsIOException, ClassNotFoundException, interruptedexception {Configuration conf =NewConfiguration (); Job Job = job.getinstance(conf);//Set the jar package required for the entire job        //through Wcruner to find other dependent wcmapper and WcreducerJob.Setjarbyclass(Wcrunner.class);//Mapper and reducer classes used by this jobJob.Setmapperclass(Wcmapper.class); Job.Setreducerclass(Wcreducer.class);//Specify the output kv type of the reducerJob.Setoutputkeyclass(Text.class); Job.Setoutputvalueclass(longwritable.class);//Specify the output kv type of the mapperJob.Setmapoutputkeyclass(Text.class); Job.Setmapoutputvalueclass(longwritable.class);//Specify where the original data is storedFileinputformat.setinputpaths(Job,New Path("/wc/input/"));//Specify where the output data for processing results is storedFileoutputformat.Setoutputpath(Job,New Path("/wc/output/"));//To run the job submissionJob.waitforcompletion(true); }}

Second, the problem recurs

After the code is written and packaged into a jar, the blogger is graphically manipulated with idea and then submitted to Hadoop to run

hadoop jar hadoopStudy.jar wordcount.WCRunner

Results did not appear as the official website and many other tutorials that the results, but error

Exception"main" java.io.IOException: Mkdirs failed to create /var/folders/vf/rplr8k812fj018q5lxcb5k940000gn/T/hadoop-unjar1598612687383099338/META-INF/license    at org.apache.hadoop.util.RunJar.ensureDirectory(RunJar.java:146)    at org.apache.hadoop.util.RunJar.unJar(RunJar.java:119)    at org.apache.hadoop.util.RunJar.unJar(RunJar.java:94)    at org.apache.hadoop.util.RunJar.run(RunJar.java:227)    at org.apache.hadoop.util.RunJar.main(RunJar.java:153)

Last toss for half a day, found to be a Mac problem, found in StackOverflow explanation

The issue is that a/tmp/hadoop-xxx/xxx/license file and a
/tmp/hadoop-xxx/xxx/license directory is being created on a
Case-insensitive file system when unjarring the mahout jobs.

Delete the original compressed package Meta-inf/licens, and then re-compress, solve the problem ~

zip -d hadoopStudy.jar META-INF/LICENSEjar|grep LICENSE

Then upload the new jar to Hadoop and run it.

hadoop jar hadoopStudy.jar wordcount.WCRunner

Bingo!

Iii. Results of operation

By the Way, look at the results of the operation

Input filewc/input/input.txt

Output file/wc/output/part-r-00000]

The running result is obviously correct, no longer dare to say Mac Dafa well ...

Mac under Hadoop run Word Count's pit

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Mac under Hadoop run Word Count's pit

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Mac under Hadoop run Word Count's pit

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support