MapReduce instance (i)

Source: Internet
Author: User
Tags static class

Original address: http://chenxiaoqiong.com/articles/mapreduce1/ basic Concept

Hadoop: The most central design of the framework is: HDFs and MapReduce. HDFS provides storage for massive amounts of data, MapReduce provides calculations for massive amounts of data.
MapReduce: A programming model that deals with a large number of semi-structured data sets. The simplest MapReduce application consists of at least 3 parts: A Map function, a Reduce function, and a main function. My simple understanding is that map does a series of processing of input according to certain rules, reduce to output, before the output may do some operations, such as statistics and so on. an interesting example


You want to count the number of spades in a stack of cards. The intuitive way is to have a single check and count out how many are spades.
The MapReduce method is:
1. Assign this stack to all players present
2. Let each player count his hands a few cards are spades, and then report this number to you
3. You add up all the numbers that the player tells you, and get the final example demand

Calculates the number of occurrences of a word in the input file. The contents of a word in the input file are separated by a space, which requires the output word, space, and number of occurrences of the word in the output file.

Input file

I am a pretty girl

Output file

I   3
a   2
am  3
girl    1
pretty  1
programmer  1
proud   1
so  1   
Design Ideas

According to the map rules, we read every line in the input file, get each word, output [the word 1],reduce get [key,value-list], the cycle list counts the number of times each key appears, output [number of words] code implementation

The code has been uploaded to my git:https://github.com/chenxiaoqiong/worldcountmapreduce
Main code:

/** *  

Pom.xml

<?xml version= "1.0" encoding= "UTF-8"?> <project xmlns= "http://maven.apache.org/POM/4.0.0" xmlns:xsi= "HTT" P://www.w3.org/2001/xmlschema-instance "xsi:schemalocation=" http://maven.apache.org/POM/4.0.0 Http://maven.apach E.org/xsd/maven-4.0.0.xsd "> <modelVersion>4.0.0</modelVersion> <groupid>hadoop</groupid&
    Gt <artifactId>countMapReduce</artifactId> <version>1.0-SNAPSHOT</version> <repositories > <repository> <id>apache</id> <url>http://maven.apache.org<
            /url> </repository> </repositories> <dependencies> <dependency>
            <groupId>org.apache.hadoop</groupId> <artifactId>hadoop-common</artifactId> <version>2.7.3</version> </dependency> <dependency> <groupid >org.apache.hadoOp</groupid> <artifactId>hadoop-hdfs</artifactId> <version>2.7.3</versi 
            on> </dependency> <dependency> <groupId>org.apache.hadoop</groupId> <artifactId>hadoop-client</artifactId> <version>2.7.3</version> & Lt;/dependency> <dependency> <groupId>org.apache.hadoop</groupId> 
    ;artifactid>hadoop-core</artifactid> <version>1.2.1</version> </dependency> </dependencies> <build> <plugins> <plugin> <artifa Ctid>maven-dependency-plugin</artifactid> <configuration> <excludetr
                    Ansitive>false</excludetransitive> <stripVersion>true</stripVersion> <outputdirectory>./lib</outputdirectory> </configuration> </plugin> </plug Ins> </build> </project>

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.