MapReduce programming combat

Source: Internet
Author: User
Keywords Yes 2345 we function
Tags aliyun analysis apache data data processing directory environment file

What is MapReduce?

MapReduce is a programming model for Hadoop (this big data http://www.aliyun.com/zixun/aggregation/14345.html> Data Processing Environment), which, since called a model, means that it has a fixed form.

MapReduce programming model, Hadoop ecological environment for data analysis and processing of fixed programming.

This fixed form of programming is described below:

The MapReduce task process is divided into two phases: the map phase and the reduce phase. Each phase takes key / value pairs as input and output, and the programmer selects their type.

In other words, programmers only need to define two functions: map function and reduce function just fine, other calculations to hadoop just fine.

From the above description, we can see:

The scenes that MapReduce can handle are actually very specific, very limited, just the "statistical analysis of data" scenario.

Input data preparation

Weather forecast official website: ftp://ftp.ncdc.noaa.gov/pub/data/gsod/

However, found that the official website of the file format and "Hadoop authoritative guide" (http://www.linuxidc.com/Linux/2012-07/65972.htm) format used inconsistent, do not know is a long time, the official website The format has changed, or the author processed the original format, or the web site is not right, so I went to the "Hadoop authoritative guide" specified address to download one, the following address:

https://github.com/tomwhite/hadoop-book/tree/master/input/ncdc/all

If a simple test, but also the following lines can be pasted into a text file, which is the right weather file:

0035029070999991902010113004 + 64333 + 023450FM-12 + 000599999V0201401N011819999999N0000001N9-01001 + 99999100311ADDGF104991999999999999999999MW1381

0035029070999991902010120004 + 64333 + 023450FM-12 + 000599999V0201401N013919999999N0000001N9-01171 + 99999100121ADDGF108991999999999999999999MW1381

0035029070999991902010206004 + 64333 + 023450FM-12 + 000599999V0200901N009819999999N0000001N9-01611 + 99999100121ADDGF108991999999999999999999MW1381

0029029070999991902010213004 + 64333 + 023450FM-12 + 000599999V0200901N011819999999N0000001N9-01721 + 99999100121ADDGF108991999999999999999999

0029029070999991902010220004 + 64333 + 023450FM-12 + 000599999V0200901N009819999999N0000001N9-01781 + 99999100421ADDGF108991999999999999999999

In this article, we name the text file that stores the weather format: temperature.txt

MapReduce Java programming

There are two sets of JavaAPI, the old is org.apache.hadoop.mapred package, MapReduce programming is to use the interface to achieve the new org.apache.hadoop.marreduce package, MapReduce programming is the use of inheritance abstract base class; In fact Are similar, there will be displayed below.

Maven

<dependency>

<groupId> org.apache.hadoop </ groupId>

<artifactId> hadoop-core </ artifactId>

<version> 1.0.4 </ version>

</ dependency>

Can also not official, rewritten with someone else to modify, you can directly run inside Eclipse like MapReduce ordinary Java programs.

Compiled hadoop-core-1.0.4.jar, you can simulate MapReduce locally

If the Eclipse workspace is d :, then we can put a directory of d: such as d: \ input as the input directory and d: \ output as the output directory.

MapReduce programming model inside the write on it:

FileInputFormat.setInputPaths (job, new Path ("/ input"));

FileOutputFormat.setOutputPath (job, new Path ("/ output"));

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.