Today began to MapReduce design patterns this book on the MapReduce example, I think this book on learning MapReduce programming very well, the book finished, basically can meet the mapreduce problems can also be dealt with. Let's start with the first piece. This procedure i
level of fault tolerance and is designed to be deployed on inexpensive (low-cost) hardware, and it provides high throughput (hi throughput) to access application data for applications with very large datasets (large data set). HDFs relaxes the requirements of (relax) POSIX and can access data in a stream (streaming access) file system. The core design of the Hadoop framework is: HDFs and MapReduce. HDFS provides storage for massive amounts of data, a
1. Chaining mapreduce jobs task chain
2. Join data from different data source
1.1 chaining mapreduce jobs in a sequence
MapreduceProgramIt can execute some complex data processing tasks. Generally, this task needs to be divided into several smaller subtasks, and then each subtask is executed through the job in hadoop, the subtask results of the teaching plan are collected to complete the
Reduce code is used for addition and statistics,
Package Org. freebird. reducer; import Java. io. ioexception; import Org. apache. hadoop. io. intwritable; import Org. apache. hadoop. mapreduce. CER Cer. context; import Org. apache. hadoop. mapreduce. reducer; public class logreducer
Iterate through values and retrieve all values, which are 1, simple addition.
Then the result is written to the conte
MapReduce 2.x programming Series 1 builds a basic Maven project, mapreducemaven
This is a maven project. After mvn 3.2.2 is installed,
mvn --versionApache Maven 3.2.3 (33f8c3e1027c3ddde99d3cdebad2656a31e8fdf4; 2014-08-12T04:58:10+08:00)Maven home: /opt/apache-maven-3.2.3Java version: 1.7.0_09, vendor: Oracle CorporationJava home: /data/hadoop/data1/usr/local/jdk1.7.0_09/jreDefault locale: en_US, platform e
MapReduce programming series three Reduce stage implementation, mapreducereduce
Reduce code is used for addition and statistics,
package org.freebird.reducer;import java.io.IOException;import org.apache.hadoop.io.IntWritable;import org.apache.hadoop.mapreduce.Reducer.Context;import org.apache.hadoop.mapreduce.Reducer;public class LogReducer
Iterate through values and retrieve all values, which are 1, simp
classes that are used by the entire jobJob.setjarbyclass (Wcrunner.class);Job.setmapperclass (Wcmapper.class);Job.setreducerclass (Wcreducer.class);Map output Data kv typeJob.setmapoutputkeyclass (Text.class);Job.setmapoutputvalueclass (Longwritable.class);Reduce output data kv typeJob.setoutputkeyclass (Text.class);Job.setoutputvalueclass (Longwritable.class);Path to execute input dataFileinputformat.setinputpaths (Job, New Path ("/wordcount/inpput"));Path to execute output dataFileoutputforma
HDFs = Mypath.getfilesystem (conf);//Get File systemif (Hdfs.isdirectory (MyPath)){//If this output path exists in the file system, delete theHdfs.delete (MyPath, true);} Job Wcjob = new Job (conf, "WC");//Build a Job object named TestanagramSet the jar package for the classes that are used by the entire jobWcjob.setjarbyclass (Wcrunner.class);Mapper and reducer classes used by this jobWcjob.setmapperclass (Wcmapper.class);Wcjob.setreducerclass (Wcreducer.class);Specify the output data kv type
The first to implement MapReduce is to rewrite two functions, one is map and the other is reducemap(key ,value)The map function has two parameters, one is key, one is valueIf your input type is Textinputformat (default), then the input of your map function will be:
Key: The offset of the file (that is, the values in the location of the file)
Value: This is a line of string (Hadoop takes each line of the file as input)
Hadoop executes the ma
)Throws ioexception,interruptedexception{Line=value;Context.write (line, New Text ("")); [/indent]}}Reduce copies the key from the input to the key of the output data and outputs it directlypublic static class Reduce extends reducerImplementing the Reduce functionpublic void reduce (Text key,iterableThrows ioexception,interruptedexception{Context.write (Key, New Text (""));}}public static void Main (string[] args) throws exception{Configuration conf = new configuration ();That's a key word.Conf.
= Mypath.getfilesystem (conf);if (Hdfs.isdirectory (MyPath)){Hdfs.delete (MyPath, true);}@SuppressWarnings ("deprecation")Job Job = new Job (conf, "gender");//Create a new taskJob.setjarbyclass (Gender.class);//Main classJob.setmapperclass (pcmapper.class);//mapperJob.setreducerclass (pcreducer.class);//reducerJob.setpartitionerclass (Myhashpartitioner.class);Job.setpartitionerclass (Pcpartitioner.class);//Set Partitioner classJob.setnumreducetasks (3);//reduce number set to 3Job.setmapoutputke
(Firstpartitioner.class);//partition functionJob.setsortcomparatorclass (Keycomparator.class);//This course does not have custom sortcomparator, but instead uses Intpair's own sortJob.setgroupingcomparatorclass (Groupingcomparator.class);//Group functionJob.setmapoutputkeyclass (Intpair.class);Job.setmapoutputvalueclass (Intwritable.class);Job.setoutputkeyclass (Text.class);Job.setoutputvalueclass (Intwritable.class);Job.setinputformatclass (Textinputformat.class);Job.setoutputformatclass (Text
MapReduce consists of two phases: one is map, and the other is reduce. Each step has key-value pairs as input and output.The format of the Key-value pair in the map stage is determined by the input format, and if it is the default Textinputformat, each row is processed as a record process where key is the starting position of the line relative to the beginning of the file, and value is the character text of the line. The Key-value pair format of the o
for merge for files.
Therefore, after observing jobtracker, we can see that the map operation has not been completely completed, and the reduce operation has started, that is, it enters the copy stage.
2. parallelism between the sort stage and reduce function that calls CER
Sort sorts the
Reduce compiled by the user transfers the preceding
Parallel algorithms improve program performance. The specific algorithms will be discussed later.3. Write
Write the result to HDFS.
Reduce optimization p
MapReduce programming Series 10 uses HashPartitioner to adjust the computing load of CER Cer.
Example4 demonstrates how to specify the number of reducers. This section describes how to use HashPartitioner to group Mapper output by key and then hand it to Cer CER for processing. A reasonable grouping policy makes the computing load of each Reducer not much different, so that the overall reduce performance is
The source code secondarysort In the example provided by Mr is changed.
The map and reduce defined in this example are as follows. The key is its definition of the input and output types: (Java generic programming)
Public static class map extends mapper Public static class reduce extends reducer
1 first, let's talk about how it works:
In the map stage, the inputformat defined by job. setinputformatclass is used to split the input dataset into small d
This is a Maven project. After MVN 3.2.2 is installed,
mvn --versionApache Maven 3.2.3 (33f8c3e1027c3ddde99d3cdebad2656a31e8fdf4; 2014-08-12T04:58:10+08:00)Maven home: /opt/apache-maven-3.2.3Java version: 1.7.0_09, vendor: Oracle CorporationJava home: /data/hadoop/data1/usr/local/jdk1.7.0_09/jreDefault locale: en_US, platform encoding: UTF-8OS name: "linux", version: "2.6.18-348.6.1.el5", arch: "amd64", family: "unix"
Run the following command to create a project:
mvn archetype:generate -Dg
The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion;
products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the
content of the page makes you feel confusing, please write us an email, we will handle the problem
within 5 days after receiving your email.
If you find any instances of plagiarism from the community, please send an email to:
info-contact@alibabacloud.com
and provide relevant evidence. A staff member will contact you within 5 working days.