Hadoop Streaming Combat: File Distribution and packaging

Source: Internet
Author: User
Tags hadoop fs

If the executable file, script, or configuration file required for the program to run does not exist on the compute nodes of the Hadoop cluster, you first need to distribute the files to the cluster for a successful calculation.

Hadoop provides a mechanism for automatically distributing files and compressing packages by simply configuring the appropriate parameters when you start the streaming job.
1.–file Distributing local files to compute nodes
The 2.–cachefile file is already stored in HDFs, and you want to treat the file as a local file on each compute node when calculating
3.–cachearchive distribution and decompression of compressed packets stored in HDFs

-file: Distributing local executables and other files using-file

A. Data to be computed into HDFs
$ Hadoop fs-put Localfile/user/hadoop/hadoopfile

B. Write the map, reduce script, and remember to add executable permissions to the script.
mapper.sh

#!/bin/sh
Wc-l

reducer.sh

#!/bin/sh
a=0 While the
read I do let
        a+= $i done
Echo $a

Hello.txt File Contents:

Hello

World


C. Run:
$ Hadoop streaming–input/user/hadoop/hadoopfile-output/user/hadoop/result-mapper./mapper.sh-reducer./reducer.sh- File Mapper.sh-file reducer.sh-file hello.txt-jobconf mapred.reduce.tasks=1-jobconf mapre.job.name= "Sum_test"

D. View the results:
$ Hadoop fs–cat/user/hadoop/result/part-00000

-cachefile Combat

A. Data and files to be calculated into HDFs
$ Hadoop fs-put hello.txt/user/hadoop/

B. Run the command (mapper.sh and reducer.sh script contents above):
$ Hadoop streaming–input/user/hadoop/hadoopfile-output/user/hadoop/result-mapper./mapper.sh-reducer./reducer.sh- File Mapper.sh-file reducer.sh-cachefile hdfs://host:port/user/hadoop/hello.txt#./hello.txt-jobconf mapred.reduce.tasks=1-jobconf mapre.job.name= "Sum_test"
You can configure the parameters by Fs.default.name in the configuration file Hadoop-site.xml to the host and port where the files are located.

C. View the results:
$ Hadoop fs–cat/user/hadoop/result/part-00000


-cachearchive Combat

A. Create a directory test, the directory contains files Mapper.txt,reducer,hello.txt
Modify mapper.sh:

#!/bin/sh
a= ' wc-l '
#使用参数
b= ' wc-l $ | awk ' {print '} ' let
c=a+b
echo $c

B. Compress the folder and put the compressed file into HDFs:
$ CD Test
$ TAR–ZCVF test.tar.gz *
$ Hadoop fs–put test.tar.gz/user/hadoop/test/

C. Run the command:
$ Hadoop streaming–input/user/hadoop/hadoopfile-output/user/hadoop/result-mapper "./test/mapper.sh./test/hello.tx T "-reducer./test/reducer.sh-cachearchive hdfs://host:port/user/hadoop/test/test.tar.gz#test-jobconf mapred.reduce.tasks=1-jobconf mapre.job.name= "Sum_test"

D. View the results:
$ Hadoop fs–cat/user/hadoop/result/part-00000

First, all files and directories in the local test directory are packaged and compressed, and then uploaded to HDFs. When you start the streaming task, use the-cachearchive option to distribute the test.tar.gz to the compute node and unzip it to the test directory, and then create a link to the test directory in the current working directory, the-mapper option is specified as the Mapper program and its operating parameters, and the-reducer option refers to Fixed reducer program. When you package locally, you go to the catalog test instead of packing it in the upper directory of test, or you will be able to access the MAPPER.SHL file through test/test/mapper.sh.

Reprint http://blog.csdn.net/yfkiss/article/details/6399874

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.