Hadoop MapReduce Programming API Entry Series mining meteorological Data version 2 (ix)

Source: Internet
Author: User
Tags hadoop mapreduce

Below, is version 1.

Hadoop MapReduce Programming API Entry Series Mining meteorological data version 1 (i)

This blog post includes, for real production development, very important, unit testing and debugging code. Here is not much to repeat, directly put on the code.

Mrunit Frame

Mrunit is a Cloudera company dedicated to Hadoop MapReduce Write the unit test framework , the API is very concise and practical. Mrunit uses different driver for different test objects:

Map Driver: For a separate map test

Reduce Driver: For a separate reduce test

MapReduce Driver: Test the map and reduce strings together

Pipelinemapreduce Driver: Test multiple mapreduce on a log

Remember, put this jar package in the project. I am here in the project's root directory under the Lib.

Code version 2

Write the code for the Temperaturemappertest.java. Compile, the following appears, the error is correct.

In the test () method, the Key/value parameters of the withinput are offset and one row of meteorological data, and the type is longwritable and text consistent with the input type of temperaturemapper. The key/value parameters of Withoutput are the new Text ("03103") and the new Intwritable (200) that we expect to output, and the test results we want to achieve are our desired output and temperaturemapper The actual output results are consistent.

Testing method is the test () method, the left dialog box shows "runs:1/1,errors:0,failures:0", indicating that the Mapper test was successful.

Create a Temperaturereducetest.java to test the reduce.

In the test () method, the Key/value parameters of the withinput are the collections of the new Text (key) and the list type, respectively. The key/value parameters of the withoutput are the new Text (key) and the new Intwritable (150) that we expect to output, The test we want to achieve is that our expected output is consistent with the actual output of the temperaturereducer.

Write the code for the Temperaturereducetest.java. Compile, the following appears, the error is correct.

Reducer end of unit test, mouse on the Temperaturereducetest class right-click, select Run as--> JUnit test, the results are as follows.

Testing method is the test () method, the left dialog box shows "runs:1/1,errors:0,failures:0", indicating that the Reducer test was successful.

MapReduce Unit Test

The test case code that integrates Mapper and Reducer is as follows.

Create a Temperaturetest.java to test.

In the test () method, Withinput adds two rows of test data, line and Line2,withoutput, with the key/value parameters of the output new Text ("03103") and the new Intwritable (150), respectively, that we expect. The test we want to achieve is that the output we expect is consistent with the actual output of temperature.

Write the code for the Temperaturetest.java. Compile, the following appears, the error is correct.

Reducer end of unit test, mouse on the Temperaturetest.java class right-click, select Run as--> JUnit test, the results are as follows.

Testing method is the test () method, the left dialog box shows "runs:1/1,errors:0,failures:0", indicating that the MapReduce test was successful.

Package zhouls.bigdata.myMapReduce.TemperatureTest;

Import java.io.IOException;

Import org.apache.hadoop.io.IntWritable;
Import org.apache.hadoop.io.LongWritable;
Import Org.apache.hadoop.io.Text;
Import Org.apache.hadoop.mapreduce.Mapper;
Import Org.apache.hadoop.mapreduce.lib.input.FileSplit;
Import org.apache.hadoop.conf.Configuration;
Import org.apache.hadoop.conf.Configured;
Import Org.apache.hadoop.fs.FileSystem;
Import Org.apache.hadoop.fs.Path;
Import org.apache.hadoop.io.IntWritable;
Import Org.apache.hadoop.io.Text;
Import Org.apache.hadoop.mapreduce.Job;
Import Org.apache.hadoop.mapreduce.Mapper;
Import Org.apache.hadoop.mapreduce.Reducer;
Import Org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
Import Org.apache.hadoop.mapreduce.lib.input.FileSplit;
Import Org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
Import Org.apache.hadoop.util.Tool;
Import Org.apache.hadoop.util.ToolRunner;
/**
* Statistics the average temperature in 30 years for each meteorological station in the United States
* 1, write the map () function
* 2. Write the reduce () function
* 3, write run () execution method, responsible for running the MapReduce job
* 4. Run the program in the main () method
*
* @author Zhouls
*
*/
Inherit the configured class and implement the tool interface
Public class temperature extends configured implements Tool
{
public static class Temperaturemapper extends Mapper<longwritable, text, text, intwritable>
{//input key, input value, output key, output value
/**
* @function Mapper Analysis Station data
* @input key= Offset value= weather station data
* @output Key=weatherstationid Value=temperature
*/
public void Map (longwritable key, Text value, Context context) throws IOException, Interruptedexception
The {//map () function also provides a context instance for the output of a key-value pair
The first step is to convert each row of station data to a string type of each row
String line = value.tostring (); Per row of meteorological data

Step two: Extracting the temperature value
int temperature = Integer.parseint (Line.substring (). Trim ());//Hourly temperature
Need to convert to shaping, intercept 14th to 19 bits, remove the middle space.
if (temperature! =-9999)//filter Invalid data
{
Step three: Extracting station number
Get input shards
Filesplit filesplit = (filesplit) context.getinputsplit ();//extract the input shards and convert the type
Then extract the station number by file name
String Weatherstationid = Filesplit.getpath (). GetName (). substring (5, 10);//extract station ID by file name
First through the file Shard filesplit to get the file path, and then get the file name, and then intercept 5th to 10th place to get the station number
Context.write (New Text (Weatherstationid), new intwritable (temperature));
Weather station number, temperature value
}
}
}


public static class Temperaturereducer extends reducer< text, intwritable, text, intwritable>
{
Private intwritable result = new intwritable ();
Because the temperature is intwritable type.
public void reduce (Text key, iterable< intwritable> Values,context Context) throws IOException, Interruptedexception
{//reduce output of the Key,key collection, instance of the context
The first step: to count all the temperatures of the same weather station
int sum = 0;
int count = 0;
for (intwritable val:values)//for loop to cycle all the temperature values of the same weather station
{//cumulative for all temperature values
Sum + = Val.get ();
count++;
}
Result.set (Sum/count);
Context.write (key, result);
}
}

public int run (string[] args) throws Exception
{
TODO auto-generated Method Stub
First step: Read the configuration file
Configuration conf = new configuration ();//Read config file

Second step: The output path exists first delete
Path MyPath = new Path (args[1]);//Path object that defines the output path, MyPath
FileSystem HDFs = Mypath.getfilesystem (conf);//Get file system through Getfilesystem under the path
if (Hdfs.isdirectory (mypath))//If the output path exists
{
Hdfs.delete (MyPath, true);//delete
}
Step three: Build the Job Object
Job Job = new Job (conf, "temperature");//Create a new task, the job name is Tempreature
Job.setjarbyclass (Temperature.class);//Set main class
To set the main class Temperature.class through the Job Object

Fourth step: Specify the input path and output path of the data
Fileinputformat.addinputpath (Job, New Path (args[0]));//input path, args[0]
Fileoutputformat.setoutputpath (Job, New Path (args[1]));//output path, args[1]

Fifth step: Specify mapper and Reducer
Job.setmapperclass (Temperaturemapper.class);//Mapper
Job.setreducerclass (Temperaturereducer.class);//Reducer

Sixth step: Set the output type of the map function and the REDUCER function
Job.setoutputkeyclass (Text.class);
Job.setoutputvalueclass (Intwritable.class);

Seventh Step: Submit the Job
Return Job.waitforcompletion (True)? 0:1;//Submit Task
}


/**
* @function Main method
* @param args
* @throws Exception
*/
public static void Main (string[] args) throws Exception
{
The first step
string[] Args0 =
//{
"Hdfs://hadoopmaster:9000/temperature/",
"Hdfs://hadoopmaster:9000/out/temperature/"
//};

string[] Args0 =
{
"./data/temperature/",
"./out/temperature/"
};



Step Two
int EC = Toolrunner.run (New Configuration (), new temperature (), args0);
The first parameter is the read configuration file, the second parameter is the main class temperature, and the third parameter is a group of outputs such as path and output path
System.exit (EC);
}

}

Package zhouls.bigdata.myMapReduce.TemperatureTest;

Import java.io.IOException;


Import org.apache.hadoop.io.IntWritable;
Import org.apache.hadoop.io.LongWritable;
Import Org.apache.hadoop.io.Text;
Import Org.apache.hadoop.mapreduce.Mapper;
Import Org.apache.hadoop.mrunit.mapreduce.MapDriver;
Import Org.junit.Before;
Import org.junit.Test;

/**
* Unit Test at Mapper end
*/
@SuppressWarnings ("All")
public class Temperaturemappertest
{
Private Mapper mapper;//defines a Mapper object
Private Mapdriver driver;//defines a Mapdriver object
@Before
public void init ()//initialization method Init
{
Mapper = new Temperature.temperaturemapper ();//Instantiate a Temperaturemapper object in a temperature
Driver = new Mapdriver (mapper);//Instantiate Mapdriver Object
}
@Test
public void Test () throws IOException
{//Because the test is a map
Enter a row of test data
String line = "1985 07 31 02 200 94 10137 220 26 1 0-9999";
Driver.withinput (New Longwritable (), new Text (line))//Consistent with Temperaturemapper input type
. Withoutput (New Text ("03103"), new Intwritable (200))//consistent with Temperaturemapper output type
. Runtest ();
}
}

Package zhouls.bigdata.myMapReduce.TemperatureTest;

Import java.io.IOException;

Import java.util.ArrayList;
Import java.util.List;
Import org.apache.hadoop.io.IntWritable;
Import Org.apache.hadoop.io.Text;
Import Org.apache.hadoop.mapreduce.Reducer;
Import Org.apache.hadoop.mrunit.mapreduce.ReduceDriver;
Import Org.junit.Before;
Import Org.junit.Test;

/**
* Reducer Unit Test
*/
@SuppressWarnings ("All")
public class Temperaturereducetest
{
Private Reducer reducer;//defines a Reducer object
Private Reducedriver driver;//defines a Reducedriver object
@Before
public void init ()//initialization method Init
{
REDUCER = new Temperature.temperaturereducer ();//Instantiate a Temperaturereducer object in a temperature
Driver = new Reducedriver (reducer);//Instantiate Reducedriver Object
}
@Test
public void Test () throws IOException
{
String key = "03103";//declares a key value
List values = new ArrayList ();
Values.add (new Intwritable (200));//Add First value
Values.add (New intwritable (100));//Add a second value value
Driver.withinput (key), values)//Consistent with Temperaturereducer input type
. Withoutput (New Text (key), new Intwritable (150))//consistent with Temperaturereducer output type
. Runtest ();
}
}

Package zhouls.bigdata.myMapReduce.TemperatureTest;

Import java.io.IOException;

Import org.apache.hadoop.io.IntWritable;
Import org.apache.hadoop.io.LongWritable;
Import Org.apache.hadoop.io.Text;
Import Org.apache.hadoop.mapreduce.Mapper;
Import Org.apache.hadoop.mapreduce.Reducer;
Import Org.apache.hadoop.mrunit.mapreduce.MapReduceDriver;
Import Org.junit.Before;
Import Org.junit.Test;

/**
* Mapper and Reducer integrated test.
*/
@SuppressWarnings ("All")
public class Temperaturetest {
Private Mapper mapper;//defines a Mapper object
Private Reducer reducer;//defines a Reducer object
Private Mapreducedriver driver;//defines a Mapreducedriver object
@Before
public void init ()//initialization method Init
{
Mapper = new Temperature.temperaturemapper ();//Instantiate a Temperaturemapper object in a temperature
REDUCER = new Temperature.temperaturereducer ();//Instantiate a Temperaturereducer object in a temperature
Driver = new Mapreducedriver (mapper, reducer);//Instantiate Mapreducedriver Object
}
@Test
public void Test () throws RuntimeException, IOException
{
Enter two lines of test data
String line = "1985 07 31 02 200 94 10137 220 26 1 0-9999";
String line2 = "1985 07 31 11 100 56-9999 50 5-9999 0-9999";
Driver.withinput (New Longwritable (), new Text (line))//Consistent with Temperaturemapper input type
. Withinput (New Longwritable (), New Text (line2))
. Withoutput (New Text ("03103"), new Intwritable (150))//consistent with Temperaturereducer output type
. Runtest ();
}
}

Hadoop MapReduce Programming API Entry Series mining meteorological Data version 2 (ix)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.