"Dismembered" Hadoop program base Template

Source: Internet
Author: User

Distributed programming is relatively complex, and Hadoop itself is shrouded in big data, cloud computing and other veils, so many beginners are deterred. In fact, Hadoop is a very easy-to-use distributed programming framework that has been well packaged to mask the complexities of many distributed environments, making it easy and easy to divert for ordinary developers.

Most Hadoop programs can be written with a simple dependency on a template and its variants. When writing a new MapReduce program, we usually take an existing MapReduce program, which is done by modifying the functionality we want. It's almost divert for writing most Hadoop programs.

It is convenient to write MapReduce in the Java language, because the Hadoop API provides Mapper and Reducer abstract classes, and for developers it is only possible to inherit the two abstract classes and then implement the methods within the abstract class.

A static class Mapclass that inherits Mapper:
This class implements the map (Text key,text value,context Context) method, and the map method contains three parameters:
Text key: The key value of each line of the file (that is, the referenced patent).
Text value: The value of each line of the file (that is, the referenced patent).
Context of the Context:map side.
The map method is mainly to parse the string into Key-value form, sent to the Reduce end to statistics.
Note: the file input format in this task is keyvaluetextinputformat, so the map method can directly use Key/value as the output result.

a static class Reduceclass that inherits Reducer:  
This class implements reduce (Text key, iterable< text> values, Conte XT context) method, the reduce method contains three parameters: the key value of the output of the
         Text Key:map side.
         Iterable< text> The Value collection (a collection of the same Key) for the Values:map-side output.
         context The context of the context:reduce side. The main function of the
reduce method is to obtain the key-value result of the map method, the same key is sent to the same reduce processing, then the key is iterated, the value is added, and the result is written into the HDFS system.

  The Drive method run () is explained as follows:
Configuration class:
Read the configuration files for Hadoop, such as Site-core.xml, Mapred-site.xml, Hdfs-site.xml, and so on. You can also use the Set method to reset, such as Conf.set ("Fs.default.name", "Hdfs://single.hadoop.dajiang tai.com:9000 ").

  Note:The Set method sets a value that overrides the value that is configured inside the configuration file.
Job class:

Represents a MapReduce task. The job is constructed with two parameters, the first parameter is the Configuration, and the second parameter is the job name (equivalent to the name of the task).

  Main call
Represents a MapReduce task. The job is constructed with two parameters, the first parameter is the Configuration, and the second parameter is the job name (equivalent to the name of the task).

Dismember the Hadoop program base template

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.