[Linux] [Hadoop] Running WordCount example

Source: Internet
Author: User
Tags builtin file copy shuffle hadoop fs

Immediately after the completion of the installation and running of Hadoop, it is time to run the relevant example, and the simplest and most straightforward example is the HelloWorld-wordcount example.

Follow the blog to run: http://xiejianglei163.blog.163.com/blog/static/1247276201443152533684/

First create a folder, and create two files, directory arbitrary, for the following file structure:

Examples

--file1.txt

--file2.txt

The contents of the document are freely filled in, I am from the news copy down a paragraph of English:

Execute the following command:

[Email protected]:/usr/local/gz/hadoop-2.4. 1$./bin/hadoop FS-mkdir /data    #在hadoop中创建/data folder, which is used to store input data, this file is not a file under the root directory of Linux, It's a folder under Hadoop.
[email protected]:/usr/local/gz/hadoop-2.4. 1$./bin/hadoop fs-put-f./data_input/*  /data #将前面生成的两个 file copy to/data

Execute the wordcount command and view the results:

[Email protected]:/usr/local/gz/hadoop-2.4.1$./bin/hadoop jar./share/hadoop/mapreduce/sources/hadoop-mapreduce-examples-2.4.1-sources.jar Org.apache.hadoop.examples.wordcount/data/Output -/ -/ A  A: the: -WARN util. nativecodeloader:unable to load Native-hadoop Library forYour platform ... using builtin-Java classes where applicable -/ -/ A  A: the: -INFO client. Rmproxy:connecting to ResourceManager at/0.0.0.0:8032 -/ -/ A  A: the: inINFO input. Fileinputformat:total input paths to process:2 -/ -/ A  A: the: inINFO MapReduce. Jobsubmitter:number of Splits:2 -/ -/ A  A: the: -INFO MapReduce. Jobsubmitter:submitting Tokens forjob:job_1406038146260_0001 -/ -/ A  A: the: +INFO Impl. yarnclientimpl:submitted Application application_1406038146260_0001 -/ -/ A  A: the: +INFO MapReduce. Job:the URL to track the job:http://ubuntu:8088/proxy/application_1406038146260_0001/ -/ -/ A  A: the: +INFO MapReduce. Job:running job:job_1406038146260_0001 -/ -/ A  A: the: -INFO MapReduce. Job:job job_1406038146260_0001 RunninginchUber mode:false -/ -/ A  A: the: -INFO MapReduce. Job:map0% reduce0% -/ -/ A  A: *: theINFO MapReduce. Job:map -% reduce0% -/ -/ A  A: *: theINFO MapReduce. Job:map -% reduce -% -/ -/ A  A: *: theINFO MapReduce. Job:job job_1406038146260_0001 completed successfully -/ -/ A  A: *: -INFO MapReduce. Job:counters: theFile System Counters file:number of bytes read=2521file:number of bytes written=283699File:number of Read operations=0File:number of large read operations=0File:number ofWriteoperations=0hdfs:number of bytes read=2280hdfs:number of bytes written=1710Hdfs:number of Read operations=9Hdfs:number of large read operations=0Hdfs:number ofWriteoperations=2Job Counters launched map tasks=2launched reduce tasks=1Data-local Map tasks=2 Total TimeSpent by all mapsinchOccupied slots (ms) =71182 Total TimeSpent by all reducesinchOccupied slots (ms) =13937 Total TimeSpent by all map tasks (ms) =71182 Total TimeSpent by all reduce tasks (ms) =13937Total VCore-seconds taken by all map tasks=71182Total VCore-seconds taken by all reduce tasks=13937Total megabyte-seconds taken by all map tasks=72890368Total megabyte-seconds taken by all reduce tasks=14271488Map-Reduce Framework Map input Records= inMap Output Records=274Map Output bytes=2814Map output materialized bytes=2527InputSplitbytes=202Combine Input Records=274Combine Output Records=195Reduce Inputgroups= theReduce Shuffle bytes=2527Reduce Input Records=195Reduce Output Records= thespilled Records=390shuffled Maps=2Failed Shuffles=0merged MAP outputs=2GC TimeElapsed (ms) =847CPU TimeSpent (ms) =6410physical Memory (bytes) Snapshot=426119168Virtual Memory (bytes) Snapshot=1953292288Total committed heap usage (bytes)=256843776Shuffle Errors bad_id=0CONNECTION=0Io_error=0Wrong_length=0Wrong_map=0Wrong_reduce=0File Input Format Counters Bytes Read=2078File Output Format Counters Bytes written=1710[email protected]:/usr/local/gz/hadoop-2.4.1$

The above log shows the details of the WordCount, and then executes the view Results command to view the results:

[Email protected]:/usr/local/gz/hadoop-2.4.1$./bin/hadoop FS-Cat/output/part-r-00000 -/ -/ A  A: -: toWARN util. nativecodeloader:unable to load Native-hadoop Library forYour platform ... using builtin-Java classes where applicable"As 1"Atrocious,"    1-1Ten-day1 -      1 -      1 -,12006.13, the   1432     1 $      17.4. the  1: Help2: Help<Enter>1: Q<Enter>1<F1>1Already,1Ban1Benjamin1

A lot of statistical data is omitted, and the result of WordCount statistic is finished.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.