Summary of methods for passing parameters in hadoop

Source: Internet
Author: User

To write a mapreduce program, you usually need to pass a variety of parameters. selecting an appropriate method to pass parameters can not only improve work efficiency, but also avoid bugs. The parameter size can be roughly divided into the following types.

The most direct method is to use the various set methods of configuration, which are well supported for basic data types, such as passing the number of centers of the kmeans clustering algorithm.

How to pass an object-type parameter? All objects are constructed by the basic type, so we can overwrite the tostring () method of this object, represent all its elements as strings, and then use configuration. set (name, value) transmits this string. Then obtain the string on the Mapper side for analysis. This simple method has two disadvantages. First, converting an object to a string will result in loss of precision. For example, converting the double type to a string may not only cause loss of precision, but also indicate that the 8-byte space may become dozens of bytes. Second, because stringization and deserialization are scattered in different places, it is easy to generate bugs. If the structure of this object is modified, such bugs are very likely to generate. Since such a requirement exists, does hadoop not provide a nice method? Yes, but it is not directly described in the API documentation.

The correct method is to let this object implement the writable interface so that it can be serialized, and then use Org. apache. hadoop. io. ultstringifier's store (Conf, OBJ, keyname) and load (Conf, keyname, itemclass) static methods set and obtain this object. The main idea is to serialize this object into a byte array, encode it with base64 into a string, and then pass it to the conf, which is similar to this in parsing.

How can we transmit larger parameters, such as Word Segmentation corpus? You can use the hadoop cache file distributedcache.

1. Use the Set () and get () Methods of configuration. The name and value here are both string-type.

Configuration. Set (name, value)

Configuration. Get (name)

This method is suitable for transferring basic data types.

2. Use the stringifier interface.

DefaultStringifier.store(conf, object ,"key");

After the object is serialized, the specified key is stored in the conf file.

object = DefaultStringifier.load(conf, "key", variableClass );

MSO-Hansi-font-family: calibri "> remove 12.0pt from confmso-Hansi-font-family: calibri"> objectcalibri; MSO-Hansi-font-family: calibri ">.

MSO-ascii-font-family: calibri; MSO-Hansi-font-family: calibri "> note that objects using the second method must be serializable. Hadoopfont-family:; MSO-ascii-font-family: calibri; MSO-Hansi-font-family: calibri "> serialization is performed through writablefont-family:; the MSO-ascii-font-family: calibri; MSO-Hansi-font-family: calibri "> interface is implemented in org. apache. hadoop. iomso-Hansi-font-family: calibri "> the package contains a large number of serializable components that implement writablefont-family:; MSO-ascii-font-family: calibri; MSO-Hansi-font-family: calibri "> interface, writablefont-family:; MSO-ascii-font-family: calibri; MSO-Hansi-font-family: calibri "> interfaces provide two methods: writefont-family:; MSO-ascii-font-family: calibri; MSO-Hansi-font-family: calibri"> and readfields12.0pt; font-family:; MSO-ascii-font-family: calibri; MSO-Hansi-font-family: calibri ">, used for serialization and deserialization respectively, A typical example of implementing this interface is as follows:

 

package com.sanyuan.resource.xml.Entity;

import java.io.DataInput;

import java.io.DataOutput;

import java.io.IOException;

import org.apache.hadoop.io.Text;

import org.apache.hadoop.io.Writable;

public class PublishUrl implements Writable {

        private static final long serialVersionUID = 1L;

        private Text url;

        private Text title;

        public PublishUrl(){

               this.url = new Text();

               this.title = new Text();

        }

        public Text getUrl() {

               return url;

        }

        public void setUrl(Text url) {

               this.url = url;

        }

        public Text getTitle() {

               return title;

        }

        public void setTitle(Text title) {

               this.title = title;

        }

        @Override

        public void readFields(DataInput in) throws IOException {

               url.readFields(in);

               title.readFields(in);

               

        }

        @Override

        public void write(DataOutput out) throws IOException {

               url.write(out);

               title.write(out);

        }

        

Padding: 0 cm 0 cm 1.0pt 0 cm; Background: # eeeeee ">
} 

3; MSO-ascii-font-family: calibri; MSO-Hansi-font-family: calibri ">. For larger objectfont-family:; MSO-ascii-font-family: calibri; MSO-Hansi-font-family: calibri ">, cannot be placed in conffont-family:; MSO-ascii-font-family: calibri; MSO-Hansi-font-family: calibri ">, which requires distributedcachemso-Hansi-font-family: calibri "> or hdfsmso-Hansi-font-family: calibri"> file system. 12.0pt ">

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.