Use mrunit in hadoop for unit testing

Source: Internet
Author: User

This article address: blog Park Jing Han Jing http://gpcuster.cnblogs.com

Prerequisites

1. Understand how to use junit4.x.
2. Understand the application of mock in unit testing.
3. Understand the mapreduce programming model in hadoop.

    If you do not know about JUnit and mock, read [translation] unit testing with JUnit 4.x and easymock in eclipse-tutorial first.

    If you do not know the mapreduce programming model in hadoop, read MAP/reduce tutorial first.

    Introduction

    Mrunit is a framework developed by couldera to write mapreduce unit tests for hadoop.

    It can be used in the classic org. Apache. hadoop. mapred. * Model in version 0.18.x and in a new model of version 0.20.x org. Apache. hadoop. mapreduce.

    The official introduction is as follows:

    Mrunit is a unit test library designed to facilitate easy integration between your mapreduce development process and standard development and testing tools such as JUnit. mrunit contains mock objects that behave like classes you interact with during mapreduce execution (e.g ., inputsplit and outputcollector) as well as test harness "drivers" that test your program's correctness while maintaining compliance with the mapreduce semantics. mapper and reducer implementations can be tested individually, as well as together to form a full mapreduce job.

    Install

    In the current release of hadoop, mrunit is not included by default. You need to download a version released by couldera on the official website.

    The recommended version is hadoop-0.5111366133.tar.gz.

    After Downloading this file, you will find the jar package we need in the hadoop-0.20.1 + 133 \ contrib \ mrunit Directory: hadoop-0.20.1 + 133-mrunit.jar.

    To use mrunit, we need to add the hadoop-0.20.1 + 133-mrunit.jar and the jar package used by junit4.x: JUnit. jar to our hadoop DevelopmentProgramThe classpath of the project.

    Example

    CodeIs the best document. Let's take a look at a simple map unit test example. The Code is as follows:

    Package gpcuster.cnblogs.com;

    Import JUnit. Framework. testcase;
    Import org. Apache. hadoop. Io. text;
    Import org. Apache. hadoop. mapred. mapper;
    Import org. Apache. hadoop. mapred. Lib. identitymapper;
    Import org. JUnit. before;
    Import org. JUnit. test;
    Import org. Apache. hadoop. mrunit. mapdriver;

    Public Class Testexample extends testcase {

    Private Mapper <text, text> mapper;
    Private Mapdriver <text, text> driver;

    @ Before
    Public Void Setup (){
    Mapper = New Identitymapper <text, text> ();
    Driver = New Mapdriver <text, text> (Mapper );
    }

    @ Test
    Public Void Testidentitymapper (){
    Driver. withinput ( New Text ( "Foo" ), New Text ( "Bar" ))
    . Withoutput ( New Text ( "Foo" ), New Text ( "Bar" ))
    . Runtest ();
    }
    }

    In this sample code, the map we use is org. Apache. hadoop. mapred. Lib. identitymapper. This is a very simple map function: output What is input.

    Org. Apache. hadoop. mrunit. mapdriver is a class specifically used to test map imported from the mrunit framework.

    We use withinput to specify input parameters, withoutput to specify the expected output, and runtest to run our test.

    Function

    1. Test map. We can use mapdriver.
    2. Test reduce. We can use reducedriver.
    3. Test A complete mapreduce program. We can use mapreducedriver.
    4. Test the combined operation of multiple mapreduce. We can use pipelinemapreducedriver.

      Implementation

      The mrunit framework is very streamlined, and its core unit test relies on JUnit.

      Because the mapreduce function we wrote contains an outputcollector object, mrunit implements a set of mock objects to control the operations of outputcollector.

      Limitations

      By reading the mrunitSource codeWe will find that:

      1. partitions and sorting operations in the mapreduce framework are not supported: The values output from the map are shuffled and then directly imported into reduce.
      2. mapreduce operations implemented by streaming are not supported.

        Although mrunit has these limitations, it is sufficient to meet most of the requirements.

        References

        Http://www.cloudera.com/hadoop-mrunit

         

        This article address: blog Park Jing Han Jing http://gpcuster.cnblogs.com

        Contact Us

        The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

        If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

        A Free Trial That Lets You Build Big!

        Start building with 50+ products and up to 12 months usage for Elastic Compute Service

        • Sales Support

          1 on 1 presale consultation

        • After-Sales Support

          24/7 Technical Support 6 Free Tickets per Quarter Faster Response

        • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.