Today, I tried to install and configure Lzo on the Hadoop 2.x (YARN), encountered a lot of holes, the information on the Internet is based on Hadoop 1.x, basically not for Hadoop 2.x on the application of Lzo, I am here to record the entire installation configuration process
1. Install Lzo
Download the Lzo 2.06 version, compile the 64-bit version and sync to the cluster
wget http://www.oberhumer.com/opensource/lzo/download/lzo-2.06.tar.gz
Export cflags=-m64
./configure- enable-shared-prefix=/usr/local/hadoop/lzo/make
&& make test && make install
Synchronizing/usr/local/hadoop/lzo/to the entire cluster
2. Install Hadoop-lzo
Note that Hadoop 1.x was compiled directly by Cloudera's document clone Https://github.com/kevinweil/hadoop-lzo.git, which is fork from https:// Github.com/twitter/hadoop-lzo.
But Kevinweil this version has not been updated for a long time, and it is based on the Hadoop 1.x to compile, not for Hadoop 2.x. And twitter/hadoop-lzo three months to switch Ant's compilation to Maven, the default dependency in the Hadoop jar package is 2.x, so to clone Twitter's Hadoop-lzo, Compile Jar packs and native library with Maven.
Before compiling, think of the Hadoop-common and Hadoop-mapreduce-client-core versions of the POM as 2.1.0-beta
git clone https://github.com/twitter/hadoop-lzo.git
export cflags=-m64
export cxxflags=-m64
export C_ Include_path=/usr/local/hadoop/lzo/include
export library_path=/usr/local/hadoop/lzo/lib
mvn clean Package-dmaven.test.skip=true
TAR-CBF-c target/native/linux-amd64-64/lib. | tar-xbvf--c/usr/local/hadoop/had oop-2.1.0-beta/lib/native/
CP target/hadoop-lzo-0.4.18-snapshot.jar/usr/local/hadoop/hadoop-2.1.0-beta/ share/hadoop/common/
lib/native files, including native libraries and native compression
-rw-r--r--1 Hadoop Hadoop 104206 Sep 2 10:44 libgplcompression.a-rw-rw-r--1 Hadoop hadoop 1121 Sep 2 10:44 Lib
gplcompression.la lrwxrwxrwx 1 Hadoop hadoop 2 10:47 libgplcompression.so-> libgplcompression.so.0.0.0 lrwxrwxrwx 1 Hadoop Hadoop 2 10:47 libgplcompression.so.0-> libgplcompression.so.0.0.0-rwxrwxr-x 1 h
Adoop Hadoop 67833 Sep 2 10:44 libgplcompression.so.0.0.0-rw-rw-r--1 Hadoop hadoop 835968 Aug-17:12 Libhadoop.a -rw-rw-r--1 Hadoop hadoop 1482132 Aug 17:12 libhadooppipes.a lrwxrwxrwx 1 hadoop hadoop Aug 17:12 adoop.so-> libhadoop.so.1.0.0-rwxrwxr-x 1 hadoop hadoop 465801 Aug 17:12 libhadoop.so.1.0.0-rw-rw-r--1 Hado Op Hadoop 580384 Aug 17:12 libhadooputils.a-rw-rw-r--1 hadoop hadoop 273642 Aug 17:12 libhdfs.a lrwxrwxrwx 1 Hadoop Hadoop Aug 17:12 libhdfs.so-> libhdfs.so.0.0.0-rwxrwxr-x 1 Hadoop hadoop 181171 Aug-17:12 Li bhdfs.so.0.0.0
Will Hadoop-lzo-0.4.18-snapshot.jar and/usr/local/hadoop/hadoop-2.1.0-beta/lib/native/
Synchronizing to the entire cluster
3. Setting Environment variables
Join in the hadoop-env.sh
Export Ld_library_path=/usr/local/hadoop/lzo/lib
Core-site Join
<property>
<name>io.compression.codecs</name> <value> Org.apache.hadoop.io.compress.gzipcodec,org.apache.hadoop.io.compress.defaultcodec, Com.hadoop.compression.lzo.lzocodec,com.hadoop.compression.lzo.lzopcodec, org.apache.hadoop.io.compress.bzip2codec</value>
</property>
<property>
<name >io.compression.codec.lzo.class</name>
<value>com.hadoop.compression.lzo.lzocodec</value >
</property>
Mapred-site.xml Join
<property>
<name>mapred.compress.map.output</name>
<value>true</value>
</property>
<property>
<name>mapred.map.output.compression.codec</name>
<value>com.hadoop.compression.lzo.LzoCodec</value>
</property>
<property>
<name>mapred.child.env</name>
<value>LD_LIBRARY_PATH=/usr/local/hadoop/lzo/lib< /value>
</property>