This article address: blog Park Jing Han Jing http://gpcuster.cnblogs.com
Prerequisites
1. Understand how to use junit4.x.
2. Understand the application of mock in unit testing.
3. Understand the mapreduce programming model in hadoop.
If you do not know about JUnit and mock, read [translation] unit testing with JUnit 4.x and easymock in eclipse-tutorial first.
If you do not know the mapreduce programming model in hadoop, read MAP/reduce tutorial first.
Introduction
Mrunit is a framework developed by couldera to write mapreduce unit tests for hadoop.
It can be used in the classic org. Apache. hadoop. mapred. * Model in version 0.18.x and in a new model of version 0.20.x org. Apache. hadoop. mapreduce.
The official introduction is as follows:
Mrunit is a unit test library designed to facilitate easy integration between your mapreduce development process and standard development and testing tools such as JUnit. mrunit contains mock objects that behave like classes you interact with during mapreduce execution (e.g ., inputsplit and outputcollector) as well as test harness "drivers" that test your program's correctness while maintaining compliance with the mapreduce semantics. mapper and reducer implementations can be tested individually, as well as together to form a full mapreduce job.
Install
In the current release of hadoop, mrunit is not included by default. You need to download a version released by couldera on the official website.
The recommended version is hadoop-0.5111366133.tar.gz.
After Downloading this file, you will find the jar package we need in the hadoop-0.20.1 + 133 \ contrib \ mrunit Directory: hadoop-0.20.1 + 133-mrunit.jar.
To use mrunit, we need to add the hadoop-0.20.1 + 133-mrunit.jar and the jar package used by junit4.x: JUnit. jar to our hadoop DevelopmentProgramThe classpath of the project.
Example
CodeIs the best document. Let's take a look at a simple map unit test example. The Code is as follows:
Package gpcuster.cnblogs.com;
Import JUnit. Framework. testcase;
Import org. Apache. hadoop. Io. text;
Import org. Apache. hadoop. mapred. mapper;
Import org. Apache. hadoop. mapred. Lib. identitymapper;
Import org. JUnit. before;
Import org. JUnit. test;
Import org. Apache. hadoop. mrunit. mapdriver;
Public Class Testexample extends testcase {
Private Mapper <text, text> mapper;
Private Mapdriver <text, text> driver;
@ Before
Public Void Setup (){
Mapper = New Identitymapper <text, text> ();
Driver = New Mapdriver <text, text> (Mapper );
}
@ Test
Public Void Testidentitymapper (){
Driver. withinput ( New Text ( "Foo" ), New Text ( "Bar" ))
. Withoutput ( New Text ( "Foo" ), New Text ( "Bar" ))
. Runtest ();
}
}
In this sample code, the map we use is org. Apache. hadoop. mapred. Lib. identitymapper. This is a very simple map function: output What is input.
Org. Apache. hadoop. mrunit. mapdriver is a class specifically used to test map imported from the mrunit framework.
We use withinput to specify input parameters, withoutput to specify the expected output, and runtest to run our test.
Function
1. Test map. We can use mapdriver.
2. Test reduce. We can use reducedriver.
3. Test A complete mapreduce program. We can use mapreducedriver.
4. Test the combined operation of multiple mapreduce. We can use pipelinemapreducedriver.
Implementation
The mrunit framework is very streamlined, and its core unit test relies on JUnit.
Because the mapreduce function we wrote contains an outputcollector object, mrunit implements a set of mock objects to control the operations of outputcollector.
Limitations
By reading the mrunitSource codeWe will find that:
1. partitions and sorting operations in the mapreduce framework are not supported: The values output from the map are shuffled and then directly imported into reduce.
2. mapreduce operations implemented by streaming are not supported.
Although mrunit has these limitations, it is sufficient to meet most of the requirements.
References
Http://www.cloudera.com/hadoop-mrunit
This article address: blog Park Jing Han Jing http://gpcuster.cnblogs.com