Lzo installed and configured in Hadoop 2.x (YARN)

Source: Internet
Author: User
Tags join git clone

Today, I tried to install and configure Lzo on the Hadoop 2.x (YARN), encountered a lot of holes, the information on the Internet is based on Hadoop 1.x, basically not for Hadoop 2.x on the application of Lzo, I am here to record the entire installation configuration process

1. Install Lzo

Download the Lzo 2.06 version, compile the 64-bit version and sync to the cluster

wget http://www.oberhumer.com/opensource/lzo/download/lzo-2.06.tar.gz  
Export cflags=-m64  
./configure- enable-shared-prefix=/usr/local/hadoop/lzo/make  
&& make test && make install

Synchronizing/usr/local/hadoop/lzo/to the entire cluster

2. Install Hadoop-lzo

Note that Hadoop 1.x was compiled directly by Cloudera's document clone Https://github.com/kevinweil/hadoop-lzo.git, which is fork from https:// Github.com/twitter/hadoop-lzo.

But Kevinweil this version has not been updated for a long time, and it is based on the Hadoop 1.x to compile, not for Hadoop 2.x. And twitter/hadoop-lzo three months to switch Ant's compilation to Maven, the default dependency in the Hadoop jar package is 2.x, so to clone Twitter's Hadoop-lzo, Compile Jar packs and native library with Maven.

Before compiling, think of the Hadoop-common and Hadoop-mapreduce-client-core versions of the POM as 2.1.0-beta

git clone https://github.com/twitter/hadoop-lzo.git  
export cflags=-m64  
export cxxflags=-m64  
export C_ Include_path=/usr/local/hadoop/lzo/include  
export library_path=/usr/local/hadoop/lzo/lib  
mvn clean Package-dmaven.test.skip=true  
TAR-CBF-c target/native/linux-amd64-64/lib. | tar-xbvf--c/usr/local/hadoop/had oop-2.1.0-beta/lib/native/   
CP target/hadoop-lzo-0.4.18-snapshot.jar/usr/local/hadoop/hadoop-2.1.0-beta/ share/hadoop/common/

lib/native files, including native libraries and native compression

-rw-r--r--1 Hadoop Hadoop 104206 Sep 2 10:44 libgplcompression.a-rw-rw-r--1 Hadoop hadoop 1121 Sep 2 10:44 Lib  
gplcompression.la lrwxrwxrwx 1 Hadoop hadoop 2 10:47 libgplcompression.so-> libgplcompression.so.0.0.0 lrwxrwxrwx 1 Hadoop Hadoop 2 10:47 libgplcompression.so.0-> libgplcompression.so.0.0.0-rwxrwxr-x 1 h   
Adoop Hadoop 67833 Sep 2 10:44 libgplcompression.so.0.0.0-rw-rw-r--1 Hadoop hadoop 835968 Aug-17:12 Libhadoop.a -rw-rw-r--1 Hadoop hadoop 1482132 Aug 17:12 libhadooppipes.a lrwxrwxrwx 1 hadoop hadoop Aug 17:12 adoop.so-> libhadoop.so.1.0.0-rwxrwxr-x 1 hadoop hadoop 465801 Aug 17:12 libhadoop.so.1.0.0-rw-rw-r--1 Hado  Op Hadoop 580384 Aug 17:12 libhadooputils.a-rw-rw-r--1 hadoop hadoop 273642 Aug 17:12 libhdfs.a lrwxrwxrwx 1 Hadoop Hadoop Aug 17:12 libhdfs.so-> libhdfs.so.0.0.0-rwxrwxr-x 1 Hadoop hadoop 181171 Aug-17:12 Li bhdfs.so.0.0.0

Will Hadoop-lzo-0.4.18-snapshot.jar and/usr/local/hadoop/hadoop-2.1.0-beta/lib/native/

Synchronizing to the entire cluster

3. Setting Environment variables

Join in the hadoop-env.sh

Export Ld_library_path=/usr/local/hadoop/lzo/lib

Core-site Join

<property>  
    <name>io.compression.codecs</name>  <value> Org.apache.hadoop.io.compress.gzipcodec,org.apache.hadoop.io.compress.defaultcodec, Com.hadoop.compression.lzo.lzocodec,com.hadoop.compression.lzo.lzopcodec, org.apache.hadoop.io.compress.bzip2codec</value>  
</property>  
<property>  
    <name >io.compression.codec.lzo.class</name>  
    <value>com.hadoop.compression.lzo.lzocodec</value >  
</property>

Mapred-site.xml Join

<property>  
    <name>mapred.compress.map.output</name>  
    <value>true</value>  
</property>  
<property>  
    <name>mapred.map.output.compression.codec</name>  
    <value>com.hadoop.compression.lzo.LzoCodec</value>  
</property>  
<property>  
    <name>mapred.child.env</name>  
    <value>LD_LIBRARY_PATH=/usr/local/hadoop/lzo/lib< /value>  
</property>

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.