How to perform unit tests

Source: Internet
Author: User
Tags hadoop mapreduce

Hadoop's MapReduce programs are submitted to a clustered environment, where problems are difficult to locate, and sometimes it is necessary to modify the code and print the logs again to troubleshoot the problem, even if it is a small problem. If the amount of data is large, debugging is quite time consuming. Also, some of the parameters of map and reduce are those that are passed in by the Hadoop framework at runtime, such as context, inputsplit, which further increases the difficulty of debugging. If you have a good unit testing framework to help you find and clear Bug , that would be great.

Mrunit Frame

MRUnitCloudera is a unit testing framework specifically written for Hadoop MapReduce, and the API is simple and practical. Mrunit for different test objects, use different Driver:

Mapdriver: For a separate map test

Reducedriver: For a separate reduce test

Mapreducedriver: Test the map and reduce strings together

Pipelinemapreducedriver: Test multiple mapreduce on a log

Next, the temperature program is used as a test case to illustrate how to use the mrunit framework ?

Preparing test Cases

in order to understand the unit testing Framework , we have prepared a MapReduce program, which is also shown in temperature as a test case, except that the station ID (key) in the map method is extracted from the read-in file name, in order to facilitate unit testing, Here we set the key to the constant 03103,temperature specific code as shown below.

Package com.dajiangtai.hadoop.test;

Import java.io.IOException;
Import org.apache.hadoop.io.IntWritable;
Import org.apache.hadoop.io.LongWritable;
Import Org.apache.hadoop.io.Text;
Import Org.apache.hadoop.mapreduce.Mapper;
Import Org.apache.hadoop.mapreduce.lib.input.FileSplit;
Import java.io.IOException;
Import org.apache.hadoop.conf.Configuration;
Import org.apache.hadoop.conf.Configured;
Import Org.apache.hadoop.fs.FileSystem;
Import Org.apache.hadoop.fs.Path;
Import org.apache.hadoop.io.IntWritable;
Import Org.apache.hadoop.io.Text;
Import Org.apache.hadoop.mapreduce.Job;
Import Org.apache.hadoop.mapreduce.Mapper;
Import Org.apache.hadoop.mapreduce.Reducer;
Import Org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
Import Org.apache.hadoop.mapreduce.lib.input.FileSplit;
Import Org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
Import Org.apache.hadoop.util.Tool;
Import Org.apache.hadoop.util.ToolRunner;
/**
* Statistics the average temperature in 30 years for each meteorological station in the United States
* 1, write the map () function
* 2. Write the reduce () function
* 3, write run () execution method, responsible for running the MapReduce job
* 4. Run the program in the main () method
*
* @author Zhouls
*
*/
Inherit the configured class and implement the tool interface
Public class temperature extends configured implements Tool
{
public static class Temperaturemapper extends Mapper<longwritable, text, text, intwritable>
{//input key, input value, output key, output value
/**
* @function Mapper Analysis Station data
* @input key= Offset value= weather station data
* @output Key=weatherstationid Value=temperature
*/
public void Map (longwritable key, Text value, Context context) throws IOException, Interruptedexception
The {//map () function also provides a context instance for the output of a key-value pair
The first step is to convert each row of station data to a string type of each row
String line = value.tostring (); Per row of meteorological data

Step two: Extracting the temperature value
int temperature = Integer.parseint (Line.substring (). Trim ());//Hourly temperature
Need to convert to shaping, intercept 14th to 19 bits, remove the middle space.
if (temperature! =-9999)//filter Invalid data
{
Step three: Extracting station number
Get input shards
Filesplit filesplit = (filesplit) context.getinputsplit ();//extract the input shards and convert the type
Then extract the station number by file name
String Weatherstationid = Filesplit.getpath (). GetName (). substring (5, 10);//extract station ID by file name
First through the file Shard filesplit to get the file path, and then get the file name, and then intercept 5th to 10th place to get the station number
Context.write (New Text (Weatherstationid), new intwritable (temperature));
Weather station number, temperature value
}
}
}


public static class Temperaturereducer extends reducer< text, intwritable, text, intwritable>
{
Private intwritable result = new intwritable ();
Because the temperature is intwritable type.
public void reduce (Text key, iterable< intwritable> Values,context Context) throws IOException, Interruptedexception
{//reduce output of the Key,key collection, instance of the context
The first step: to count all the temperatures of the same weather station
int sum = 0;
int count = 0;
for (intwritable val:values)//for loop to cycle all the temperature values of the same weather station
{//cumulative for all temperature values
Sum + = Val.get ();
count++;
}
Result.set (Sum/count);
Context.write (key, result);
}
}

public int run (string[] args) throws Exception
{
TODO auto-generated Method Stub
First step: Read the configuration file
Configuration conf = new configuration ();//Read config file

Second step: The output path exists first delete
Path MyPath = new Path (args[1]);//Path object that defines the output path, MyPath
FileSystem HDFs = Mypath.getfilesystem (conf);//Get file system through Getfilesystem under the path
if (Hdfs.isdirectory (mypath))//If the output path exists
{
Hdfs.delete (MyPath, true);//delete
}
Step three: Build the Job Object
Job Job = new Job (conf, "temperature");//Create a new task, the job name is Tempreature
Job.setjarbyclass (Temperature.class);//Set main class
To set the main class Temperature.class through the Job Object

Fourth step: Specify the input path and output path of the data
Fileinputformat.addinputpath (Job, New Path (args[0]));//input path, args[0]
Fileoutputformat.setoutputpath (Job, New Path (args[1]));//output path, args[1]

Fifth step: Specify mapper and Reducer
Job.setmapperclass (Temperaturemapper.class);//Mapper
Job.setreducerclass (Temperaturereducer.class);//Reducer

Sixth step: Set the output type of the map function and the REDUCER function
Job.setoutputkeyclass (Text.class);
Job.setoutputvalueclass (Intwritable.class);

Seventh Step: Submit the Job
Return Job.waitforcompletion (True)? 0:1;//Submit Task
}


/**
* @function Main method
* @param args
* @throws Exception
*/
public static void Main (string[] args) throws Exception
{
The first step
string[] Args0 =
{
"Hdfs://djt002:9000/weather/",
"Hdfs://djt002:9000/weather/out"
};
Step Two
int EC = Toolrunner.run (New Configuration (), new temperature (), args0);
The first parameter is the read configuration file, the second parameter is the main class temperature, and the third parameter is a group of outputs such as path and output path
System.exit (EC);
}

}

1, open the MyEclipse, take the United States weather station as an example. In the Temperature.java,

Rewritten as

The purpose is to set the key to constant 03103

Because this station number needs to be intercepted from the file, it is not easy to test, so we set it to constant.

2, create the Extralib folder under D:\Software\hadoop-2.2.0\, will download the good Mrunit-hadoop.jar

Put it in this directory.

3, first need to import Mrunit package, that is, Mrunit-hadoop.jar package

Configure Bulid Path, Bulid path, Hadoop

4, Java Bulid Path, Libraries, Add External JARs ...

5. MAP Unit Test

Under the Com.dajiangtai.hadoop.test package, create a Temperaturemappertest.java to test the map.

6, write the code of Temperaturemappertest.java. Compile, the following appears, the error is correct.

In the test () method, the Key/value parameters of the withinput are offset and one row of meteorological data, and the type is longwritable and text consistent with the input type of temperaturemapper. The key/value parameters of Withoutput are the new Text ("03103") and the new Intwritable (200) that we expect to output, and the test results we want to achieve are our desired output and temperaturemapper The actual output results are consistent.

Testing method is the test () method, the left dialog box shows "runs:1/1,errors:0,failures:0", indicating that the Mapper test was successful.

7 Reducer Unit Test

Under the Com.dajiangtai.hadoop.test package, create a Temperaturereducetest.java to test the reduce.

In the test () method, the Key/value parameters of the withinput are the collections of the new Text (key) and the list type, respectively. The key/value parameters of the withoutput are the new Text (key) and the new Intwritable (150) that we expect to output, The test we want to achieve is that our expected output is consistent with the actual output of the temperaturereducer.

8 Write code for Temperaturereducetest.java. Compile, the following appears, the error is correct.

Reducer end of unit test, mouse on the Temperaturereducetest class right-click, select Run as--> JUnit test, the results are as follows.

Testing method is the test () method, the left dialog box shows "runs:1/1,errors:0,failures:0", indicating that the Reducer test was successful.

9 MapReduce Unit Test

The test case code that integrates Mapper and Reducer is as follows.

Under the Com.dajiangtai.hadoop.test package, create a Temperaturetest.java to test.

In the test () method, Withinput adds two rows of test data, line and Line2,withoutput, with the key/value parameters of the output new Text ("03103") and the new Intwritable (150), respectively, that we expect. The test we want to achieve is that the output we expect is consistent with the actual output of temperature.

10 write code for Temperaturetest.java. Compile, the following appears, the error is correct.

Reducer end of unit test, mouse on the Temperaturetest.java class right-click, select Run as--> JUnit test, the results are as follows.

Testing method is the test () method, the left dialog box shows "runs:1/1,errors:0,failures:0", indicating that the MapReduce test was successful.

How to perform unit tests

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.