How do I play Hadoop (a)--run my own mapreduce

Source: Internet
Author: User
Tags mkdir

Single take the title, it seems to be a bit of x, but no malice, knowledge record their own learning process, hoping to meet some like-minded people, to discuss together.

First, we have to understand: what is Hadoop. Give a person's first feeling on a word "cloud", seems to have a bit of magic color, there is no denying that Hadoop has his unique place, but also not so marvellous. He is an open source framework that can write and run distributed applications to handle large-scale data, and by building Hadoop clusters and running some mapred examples, I feel only one word: "Convenient", "robust", "extensible", "simple". Hadoop itself has a file system that can handle large-scale data, compared to:

How does it differ from SQL database?

1. With the increase in data volume, increasing the cost of database configuration, twice times the price of PC server is much higher than the price of 2 PCs, this is the advantage of Hadoop, add a computer (node), more cost-effective than a server upgrade.

2. Replace the relationship table with the key value, the personal feeling in the large-scale data processing, the key value pair is flexible.

How to understand the MapReduce of Hadoop:

Here's an article I think is interesting: here's a link for everyone to learn how I explained MapReduce to my wife.

The conceptual stuff sounds a little tedious: let's move on to our own MapReduce program:

We all know that there is an example of wordcount in the Hadoop example, and let's rewrite this example to execute a mapreduce program that belongs to us.

1, first to find WordCount source code, in the Hadoop directory of Src/examples/org/apache/hadoop/examples/wordcount.java

2. Create a folder and copy the WordCount file:

mkdir Playground

mkdir playground/src

mkdir playground/classes

CP src/examples/org/apache/hadoop/ Examples/wordcount.java Playground/src/wordcount.java


3. Compile and execute this copy in the Hadoop framework

[liye@test237 hadoop-0.20.2]$ jar-cvf playground/wordcount.jar-c playground/classes/.
Indication list (manifest)
increased: org/(read = 0) (write = 0) (0%)
increased: org/apache/(read = 0) (write = 0) (memory 0)
Increase: org/apache/hadoop/( Read-in = 0) (write = 0) (0%)
increased: org/apache/hadoop/examples/(read = 0) (write = 0) (save 0)
Increase: org/apache/hadoop/examples/ Wordcount.class (read-in = 1911) (write-out = 996) (compressed 47%)
increased: org/apache/hadoop/examples/wordcount$intsumreducer.class (read in = 1789) (write = 746) (compressed 58%)
increased: org/apache/hadoop/examples/wordcount$tokenizermapper.class (read-in = 1903) (write = 819) (compressed 56%)


4. Run your program with the following message stating the success of the execution:

[liye@test237 hadoop-0.20.2]$ bin/hadoop jar Playground/wordcount.jar org.apache.hadoop.examples.WordCount input My_ Output 11/12/05 21:33:30 INFO input. Fileinputformat:total input paths to process:1 11/12/05 21:33:31 INFO mapred. Jobclient:running job:job_201111281334_0014 11/12/05 21:33:32 INFO mapred. Jobclient:map 0% reduce 0% 11/12/05 21:33:41 INFO mapred. Jobclient:map 100% reduce 0% 11/12/05 21:33:53 INFO mapred. Jobclient:map 100% reduce 100% 11/12/05 21:33:55 INFO mapred. Jobclient:job complete:job_201111281334_0014 11/12/05 21:33:55 INFO mapred. Jobclient:counters:17 11/12/05 21:33:55 INFO mapred. Jobclient:job Counters 11/12/05 21:33:55 INFO mapred. jobclient:launched reduce Tasks=1 11/12/05 21:33:55 INFO mapred. jobclient:launched map Tasks=1 11/12/05 21:33:55 INFO mapred. Jobclient:data-local map Tasks=1 11/12/05 21:33:55 INFO mapred. Jobclient:filesystemcounters 11/12/05 21:33:55 INFO mapred. jobclient:file_bytes_read=25190 11/12/05 21:33:55 INFO mapred. jobclient:hdfs_bytes_read=44253 11/12/05 21:33:55 INFO mapred. jobclient:file_bytes_written=50412 11/12/05 21:33:55 INFO mapred. jobclient:hdfs_bytes_written=17876 11/12/05 21:33:55 INFO mapred. Jobclient:map-reduce Framework 11/12/05 21:33:55 INFO mapred. Jobclient:reduce input groups=1857 11/12/05 21:33:55 INFO mapred. Jobclient:combine output records=1857 11/12/05 21:33:55 INFO mapred. Jobclient:map input records=734 11/12/05 21:33:55 INFO mapred. Jobclient:reduce Shuffle bytes=25190 11/12/05 21:33:55 INFO mapred. Jobclient:reduce output records=1857 11/12/05 21:33:55 INFO mapred. jobclient:spilled records=3714 11/12/05 21:33:55 INFO mapred. Jobclient:map output bytes=73129 11/12/05 21:33:55 INFO mapred. Jobclient:combine input records=7696 11/12/05 21:33:55 INFO mapred. Jobclient:map output records=7696 11/12/05 21:33:55 INFO mapred. Jobclient:reduce input records=1857

5, view the results, in the file system My_output

6, finally everyone can change Wordcount.java inside the file, to achieve their desired effect.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.