Hadoop Learning Note 3 develping MapReduce

Last Update:2016-06-07 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Small notes:

Mavon is a project management tool that sets up project information through XML configuration.

Mavon POM (Project of model).

Steps:

1. Set up and configure the development environment.

2. Writing your map and reduce functions and run them in Local (standalone) mode from the command line or within Your IDE.

3. Unit test--Test on small dataset--test on the full dataset after unleash in a cluster

-Tuning

1. Configuration API

Components in Hadoop is configured using the Hadoop ' s own configuration API.
org.apache.hadoop.conf Package
Configurations read their properties from resources-xml files with a simple structure for defining name-val UE pairs.

For example, write a configuration-1.xml like:

<?xml version= "1.0"?><configuration>  <property>     <name>color</name>     <value>yellow</value>     <description>Color</description>  </property>  <property>     <name>size</name>     <value>10</value>     <description> size</description>  </property>  <property>     <name>weight</name>     <value>heavy</value>     <final>true</final>     <description>Weight< /description>  </property>  <property>     <name>size-weight</name>     <value>${size},${weight}</value>     <description>size and weight</description>  </property></configuration>

Then access it by coding below:

Configuration conf = new configuration (); Conf.addresource ("Configuration-1.xml");
Conf.addresource ("Configuration-2.xml");    More than one resource is added orderly, and the latter would overwrite the former.

Assertthat (Conf.get ("color"), is ("yellow")) Assertthat (Conf.getint ("size", 0), is ()), Assertthat (Conf.get (" Breadth "," wide "), is (" wide "));

Note:

type information is not stored in the XML file;
Instead, properties can interpreted as a given type when they is read.
Also, the get () methods allow you to specify a default value, which are used if the property was not defined in the XML file, as in the case of breadth here.
More than one resource is added orderly, and the latter properties would overwrite the former.
However, properties that is marked as final cannot is overridden in later definitions.
System Properties Take priority:

System.setproperty ("Size", "14")

Options specified with- D take priority over properties from the configuration files.

This would override the number of reducers set on the cluster or set in any client-side configuration files.

% Hadoop configurationprinter-d Color=yellow | grep color

2. Set Up Dev enviroment

The Maven POMs (Project Object Model) is used to show the dependencies needed for building and testing MapReduce programs. Actually a XML file.

hadoop-client dependency, which contains all the Hadoop client-side classes needed to interact with HDFS and MapReduce.
For running unit tests, we use junit,
For writing MapReduce tests, we use mrunit.
The hadoop-minicluster Library contains the "mini-" clusters that is useful for testing with Hadoop clusters run Ning in a single JVM.

Many IDEs can read Maven POMs directly, so can just point them at the directory containing the pom.xml file A nd start writing code.

Alternatively, you can use the Maven to generate configuration files for your IDE. For example, the following creates eclipse configuration files so you can import the project into Eclipse:

% MVN eclipse:eclipse-ddownloadsources=true-ddownloadjavadocs=true

3. Managing switching

It is common to switch between running the application locally and running it on a cluster.

Has Hadoop configuration files containing the connection settings for each cluster
We assume the existence of a directory called conf that contains three configuration files: hadoop-local.xml, Hadoop-localhost.xml, and Hadoopcluster. xml
For example, the following command shows a directory listing on the HDFS serverrunning in pseudodistributed mode On localhost:

-conf

% Hadoop  fs  -conf  conf/hadoop-localhost.xml  -lsfound 2 itemsdrwxr-xr-x-tom SuperGroup 0 2014-09-08 10:19 Inputdrwxr-xr-x-tom supergroup 0 2014-09-08 10:19 output

4. Starts MapReduce Example:

Mapper: to get year and temperature from an input string

public class Maxtemperaturemapper     extends Mapper<longwritable, text, text, intwritable> {@Overridepublic void map (longwritable key, Text value, context context)     throws IOException, interruptedexception {       String line = V Alue.tostring ();       String year = line.substring (n.);       int airtemperature = Integer.parseint (line.substring ();       Context.write (new Text (year), New Intwritable (Airtemperature));}      }

Unit test for the Mapper:

Import Java.io.ioexception;import Org.apache.hadoop.io.*;import org.apache.hadoop.mrunit.mapreduce.MapDriver; Import org.junit.*;p Ublic class Maxtemperaturemappertest {   @Test public   void Processesvalidrecord () throws IOException, interruptedexception {        text value = new Text ("0043011990999991950051518004+68750+023550fm-12+0382" +                                      //year ^^ ^^                "99999v0203201n00261220001cn9999999n9-00111+99999999999");                                      Temperature ^^ ^^ ^       new mapdriver<longwritable, text, text, intwritable> ()         . Withmapper (New Maxtemperaturemapper ())         . Withinput (New longwritable (0), value)         . Withoutput (New Text ("1950"), New Intwritable ( -11))         . Runtest ();    }}

Reducer: to get the Maxmium

public class Maxtemperaturereducer   extends Reducer<text, intwritable, Text, intwritable> {   @Override Public   void Reduce (Text key, iterable<intwritable> values, context context)             throws IOException, interruptedexception {      int maxValue = Integer.min_value;        for (intwritable value:values) {           maxValue = Math.max (MaxValue, Value.get ());       }      Context.write (Key, New Intwritable (MaxValue));}    }

Unit test for the Reducer:

@Testpublic void Returnsmaximumintegerinvalues () throws IOException, interruptedexception {   new reducedriver< Text, intwritable, text, intwritable> ()       . Withreducer (New Maxtemperaturereducer ())       . Withinput (New text (" 1950 "),                       arrays.aslist (New Intwritable (Ten), New Intwritable (5))       . Withoutput (New Text (" 1950 "), New Intwritable ())       . Runtest ();}

5. A write Job driver

Using the Tool interface , it's easy-to-write a driver to run a MapReduce job.

Then run the driver locally.

% mvn compile% export hadoop_classpath=target/classes/% HADOOP v2. maxtemperaturedriver-conf conf/hadoop-local.xml Input/ncdc/micro Output

% Hadoop v2. Maxtemperaturedriver-fs file:///-JT Local Input/ncdc/micro output

The local job runner uses a single JVM to run a job, so as long as all the classes that your job needs is on its classpath, then things'll just work.

6. Running on a cluster

A job ' s classes must is packaged into a job JAR file to send to the cluster

Hadoop Learning Note 3 develping MapReduce

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Hadoop Learning Note 3 develping MapReduce

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Hadoop Learning Note 3 develping MapReduce

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support