A detailed explanation of the running process of Hadoop

Source: Internet
Author: User
Tags integer key string map class regular expression

We simply wrote an example of the Hadoop tutorial, which can be used to analyze weather data and find the highest temperature of the year.

We wrote with the new API for Hadoop 0.20, which is as follows:

Mapper class:

* */package com.charles.parseweather; 
     
Import java.io.IOException; 
Import org.apache.hadoop.io.IntWritable; 
Import org.apache.hadoop.io.LongWritable; 
Import Org.apache.hadoop.io.Text; 
     
Import Org.apache.hadoop.mapreduce.Mapper; 
 /** * * Description: This is the map class, which defines the map function, which is based on the Hadoop0.20 of the new API * Base class has 4 type parameters, namely input key, input value, output key, output value type * In our example, * Enter the offset of the key:longwritable, representing the offset of the starting position of a line relative to the start of the file enter value: A single line of text for the specified line, which covers other information including temperature information * Output key: Year letter in this line Output value: The temperature in this line * * So for one of the lines: *0029029070999991901010106004+64333+023450fm-12+000599999v0202701n0159199999 99n0000001n9-00781+99999102001addgf108991999999999999999999 * If each row is 106 characters, and this is line 2nd, then * Enter key is 106 (offset, longwritab 
 Le type) * Input value for all text * Output key: Year information, here is 第15-19个 character, so is 1901 * Output value: Temperature information, here is the 第87-92个 character (because it may be minus temperature), so it is 0 (intwritable type)  * * * @author Charles.wang * @created May, 5:10:27 PM * */public class Maxtemperaturemapper extends mapper<longwritable, text, text, intwritable> {//defines an impossible temperature value, if this line of temperature is this number, it means that the year no statistical temperature private static final 
         
    int MISSING = 9999; This map method maps (the text offset, this line) to (year, temperature), and writes to the context//here uses the data types provided by Hadoop itself, and their role is only optimized in the process of network serialization//But eventually we'll have to turn them into Java.lan. The basic types in G deal with the public void map (longwritable key,text Value, the context context) throws IOException, Interruptedexce ption{//Because value represents this line of text, we first need to extract information from it//to convert the text transferred from the network to string line = value. 
        ToString (); 
             
             
        The year information is extracted from the row, which will act as the output of the key String, line.substring (15,19); 
             
        int airtemperature; 
             
        For the temperature, because there are positive and negative, so to differentiate, the 87th digit is the symbol bit//for positive temperature values, from the next position to intercept to 92 position, and then to the integer type//for negative temperature values, direct interception to 92-bit, and then to the integer type 
        if (line.charat () = = ' + ') {airtemperature = Integer.parseint (line.substring (88,92)); }else{airtemperature = Integer.parseint (line. substring (87,92)); 
        String quantity = line.substring (92,93); The quantity argument is a regular expression, and quantity.matches ("[01459]") indicates that the quantity can only be 0,1,4,5,9 valid if (airtemperature!= MISSING && Quan Tity.matches ("[01459]")) {//Only the correct temperature values are written to the context object, and then to the Hadoop data type as it is transmitted over the network to reduce///Here you can see Ke 
        Y is the year, value is the temperature value contained in this line Context.write (new Text (year), New Intwritable (airtemperature)); } 
             
    } 
     
}

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.