---parameter segmentation of hadoop-streaming configuration

Source: Internet
Author: User

Map:-D stream.map.output.field.separator=. Defines the delimiter for the Mapoutput field. User can customize the delimiter (except the default tab)-D stream.num.map.output.key.fields=4The fourth one is key, followed by value.  If the number of rows is less than four, the entire row of data is Key,value is empty. Summary: Is the partition of the map output key and value. Because the output is all lines of text. All have a dividing mark. Corresponding Context.write (key, value) reduce: (IBID.)-D stream.reduce.output.field.separator=SEP-D stream.num.reduce.output.fields=Numpartitioner-partitioner Org.apache.hadoop.mapred.lib.KeyFieldBasedPartitioner-D stream.map.output.field.separator=.-D stream.num.map.output.key.fields=4-d map.output.key.field.separator=.   Literal explanation: The output of the Map key field of the separator=. (the key is then sliced = =two sides)#-D num.key.fields.for.partition=2 Specifies that the first two parts of the key are divided into partition-D mapred.text.key.partitioner.options=-k1,2 Note:-k1,2Specifies that the 1th 2 fields are divided after the key is divided (the above explanation does not find the relevant document, nor the original) Example1output output (keys) because-D stream.num.map.output.key.fields=4Specify the first 4 output lines of the map as key, followed by value11.12.1.2 11.14.2.3 11.11.4.1 11.12.1.1 11.14.2.2divided into 3 reducer (the first 2 fields as the keys of partition)11.11.4.1-----------11.12.1.2 11.12.1.1-----------11.14.2.3 11.14.2.2The reducer is sorted within each division (4 fields are used for sorting at the same time) to implement the partitioner instead of using the output key of the map to do partition, but with a part of the key that corresponds to the custom partitioner in Java .11.11.4.1-----------11.12.1.1 11.12.1.2-----------11.14.2.2 11.14.2.3example2-D mapred.output.key.comparator.class=Org.apache.hadoop.mapred.lib.KeyFieldBasedComparator-D stream.map.output.field.separator=. -D stream.num.map.output.key.fields=4-d map.output.key.field.separator=.-D mapred.text.key.comparator.options=-K2,2NR-K2,2NR in-k2,2 the 2nd field after the specified key is sorted, n specifies to use a numeric sort, r specifies that the sort result is finally inverted map output (keys)11.12.1.2 11.14.2.3 11.11.4.1 11.12.1.1 11.14.2.2reducer output (sort by using the second field)11.14.2.3 11.14.2.2 11.12.1.2 11.12.1.1 11.11.4.1

---parameter segmentation of hadoop-streaming configuration

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.