Today, I tried to install and configure Lzo on the Hadoop 2.x (YARN), encountered a lot of holes, the information on the Internet is based on Hadoop 1.x, basically not for Hadoop 2.x on the application of Lzo, I am here to record the entire installation configuration process
 
1. Install Lzo
 
Download the Lzo 2.06 version, compile the 64-bit version and sync to the cluster
 
wget http://www.oberhumer.com/opensource/lzo/download/lzo-2.06.tar.gz  
Export cflags=-m64  
./configure- enable-shared-prefix=/usr/local/hadoop/lzo/make  
&& make test && make install
 
Synchronizing/usr/local/hadoop/lzo/to the entire cluster
 
2. Install Hadoop-lzo
 
Note that Hadoop 1.x was compiled directly by Cloudera's document clone Https://github.com/kevinweil/hadoop-lzo.git, which is fork from https:// Github.com/twitter/hadoop-lzo.
 
But Kevinweil this version has not been updated for a long time, and it is based on the Hadoop 1.x to compile, not for Hadoop 2.x. And twitter/hadoop-lzo three months to switch Ant's compilation to Maven, the default dependency in the Hadoop jar package is 2.x, so to clone Twitter's Hadoop-lzo, Compile Jar packs and native library with Maven.
 
Before compiling, think of the Hadoop-common and Hadoop-mapreduce-client-core versions of the POM as 2.1.0-beta
 
git clone https://github.com/twitter/hadoop-lzo.git  
export cflags=-m64  
export cxxflags=-m64  
export C_ Include_path=/usr/local/hadoop/lzo/include  
export library_path=/usr/local/hadoop/lzo/lib  
mvn clean Package-dmaven.test.skip=true  
TAR-CBF-c target/native/linux-amd64-64/lib. | tar-xbvf--c/usr/local/hadoop/had oop-2.1.0-beta/lib/native/   
CP target/hadoop-lzo-0.4.18-snapshot.jar/usr/local/hadoop/hadoop-2.1.0-beta/ share/hadoop/common/
 
lib/native files, including native libraries and native compression
 
-rw-r--r--1 Hadoop Hadoop 104206 Sep 2 10:44 libgplcompression.a-rw-rw-r--1 Hadoop hadoop 1121 Sep 2 10:44 Lib  
gplcompression.la lrwxrwxrwx 1 Hadoop hadoop 2 10:47 libgplcompression.so-> libgplcompression.so.0.0.0 lrwxrwxrwx 1 Hadoop Hadoop 2 10:47 libgplcompression.so.0-> libgplcompression.so.0.0.0-rwxrwxr-x 1 h   
Adoop Hadoop 67833 Sep 2 10:44 libgplcompression.so.0.0.0-rw-rw-r--1 Hadoop hadoop 835968 Aug-17:12 Libhadoop.a -rw-rw-r--1 Hadoop hadoop 1482132 Aug 17:12 libhadooppipes.a lrwxrwxrwx 1 hadoop hadoop Aug 17:12 adoop.so-> libhadoop.so.1.0.0-rwxrwxr-x 1 hadoop hadoop 465801 Aug 17:12 libhadoop.so.1.0.0-rw-rw-r--1 Hado  Op Hadoop 580384 Aug 17:12 libhadooputils.a-rw-rw-r--1 hadoop hadoop 273642 Aug 17:12 libhdfs.a lrwxrwxrwx 1 Hadoop Hadoop Aug 17:12 libhdfs.so-> libhdfs.so.0.0.0-rwxrwxr-x 1 Hadoop hadoop 181171 Aug-17:12 Li bhdfs.so.0.0.0
 
Will Hadoop-lzo-0.4.18-snapshot.jar and/usr/local/hadoop/hadoop-2.1.0-beta/lib/native/
 
Synchronizing to the entire cluster
 
3. Setting Environment variables
 
Join in the hadoop-env.sh
 
Export Ld_library_path=/usr/local/hadoop/lzo/lib
 
Core-site Join
 
<property>  
    <name>io.compression.codecs</name>  <value> Org.apache.hadoop.io.compress.gzipcodec,org.apache.hadoop.io.compress.defaultcodec, Com.hadoop.compression.lzo.lzocodec,com.hadoop.compression.lzo.lzopcodec, org.apache.hadoop.io.compress.bzip2codec</value>  
</property>  
<property>  
    <name >io.compression.codec.lzo.class</name>  
    <value>com.hadoop.compression.lzo.lzocodec</value >  
</property>
 
Mapred-site.xml Join
 
<property>  
    <name>mapred.compress.map.output</name>  
    <value>true</value>  
</property>  
<property>  
    <name>mapred.map.output.compression.codec</name>  
    <value>com.hadoop.compression.lzo.LzoCodec</value>  
</property>  
<property>  
    <name>mapred.child.env</name>  
    <value>LD_LIBRARY_PATH=/usr/local/hadoop/lzo/lib< /value>  
</property>