Original address: http://chenxiaoqiong.com/articles/mapreduce1/ basic Concept
Hadoop: The most central design of the framework is: HDFs and MapReduce. HDFS provides storage for massive amounts of data, MapReduce provides calculations for massive amounts of data.
MapReduce: A programming model that deals with a large number of semi-structured data sets. The simplest MapReduce application consists of at least 3 parts: A Map function, a Reduce function, and a main function. My simple understanding is that map does a series of processing of input according to certain rules, reduce to output, before the output may do some operations, such as statistics and so on. an interesting example
You want to count the number of spades in a stack of cards. The intuitive way is to have a single check and count out how many are spades.
The MapReduce method is:
1. Assign this stack to all players present
2. Let each player count his hands a few cards are spades, and then report this number to you
3. You add up all the numbers that the player tells you, and get the final example demand
Calculates the number of occurrences of a word in the input file. The contents of a word in the input file are separated by a space, which requires the output word, space, and number of occurrences of the word in the output file.
Input file
I am a pretty girl
Output file
I 3
a 2
am 3
girl 1
pretty 1
programmer 1
proud 1
so 1
Design Ideas
According to the map rules, we read every line in the input file, get each word, output [the word 1],reduce get [key,value-list], the cycle list counts the number of times each key appears, output [number of words] code implementation
The code has been uploaded to my git:https://github.com/chenxiaoqiong/worldcountmapreduce
Main code:
/** *
Pom.xml
<?xml version= "1.0" encoding= "UTF-8"?> <project xmlns= "http://maven.apache.org/POM/4.0.0" xmlns:xsi= "HTT" P://www.w3.org/2001/xmlschema-instance "xsi:schemalocation=" http://maven.apache.org/POM/4.0.0 Http://maven.apach E.org/xsd/maven-4.0.0.xsd "> <modelVersion>4.0.0</modelVersion> <groupid>hadoop</groupid&
Gt <artifactId>countMapReduce</artifactId> <version>1.0-SNAPSHOT</version> <repositories > <repository> <id>apache</id> <url>http://maven.apache.org<
/url> </repository> </repositories> <dependencies> <dependency>
<groupId>org.apache.hadoop</groupId> <artifactId>hadoop-common</artifactId> <version>2.7.3</version> </dependency> <dependency> <groupid >org.apache.hadoOp</groupid> <artifactId>hadoop-hdfs</artifactId> <version>2.7.3</versi
on> </dependency> <dependency> <groupId>org.apache.hadoop</groupId> <artifactId>hadoop-client</artifactId> <version>2.7.3</version> & Lt;/dependency> <dependency> <groupId>org.apache.hadoop</groupId>
;artifactid>hadoop-core</artifactid> <version>1.2.1</version> </dependency> </dependencies> <build> <plugins> <plugin> <artifa Ctid>maven-dependency-plugin</artifactid> <configuration> <excludetr
Ansitive>false</excludetransitive> <stripVersion>true</stripVersion> <outputdirectory>./lib</outputdirectory> </configuration> </plugin> </plug Ins> </build> </project>