Lzo compression can be performed in parallel in multiple parts, and the decompression efficiency is also acceptable.
To cooperate with the Department's hadoop platform for testing, the author details how to install the required software packages for lzo on the hadoop platform: GCC, ant, lzo, lzo encoding/decoder, and configure lzo files: core-site.xml, mapred-site.xml. Hope to help you. The following is the text:
Recently, our department has been testing the cloud computing platform hadoop. I 've been suffering from lzo three or four days of hard work. Here, we will give you a reference.
Operating System: Centos 5.5, hadoop version: hadoop-0.20.2-CDH3B4
Software packages required for lzo Installation: GCC, ant, lzo, and lzo encoding/decoder. In addition, lzo-devel dependency is required.
Configure the lzo File: Core-site.xml, mapred-site.xml
General steps:
1) install and update GCC and ant
2) install lzo on each node
3) install lzo encoding/Decoder
4) modify the configuration file and synchronize the configuration files of each node
Note: Unless otherwise specified, all operations are performed in namenode.
1. lzo installation:
1. Install GCC: Yum. Remember to update lib *, glibc *, and GCC * at the same time *
2. Ant installation:
Delete old version: Yum remove ant
Install the new version:
wget http://mirror.bjtu.edu.cn/apache//ant/binaries/apache-ant-1.8.2-bin.tar.bz2 tar -jxvf apache-ant-1.8.2-bin.tar.bz2
Add the ant environment variable:
vi /etc/profile export ANT_HOME=/usr/local/apache-ant-1.8.2 export PATH=$PATH:$ANT_HOME/bin source /etc/profile
3. lzo installation:
wget http://www.oberhumer.com/opensource/lzo/download/lzo-2.04.tar.gz tar -zxvf lzo-2.04.tar.gz ./configure --enable-shared make && make install
The library file is installed to/usr/local/lib by default. We need to specify the path of the lzo library file. Both methods can be used:
1) copy the lzo library file under the/usr/local/lib directory to/usr/lib (32-bit platform) or/usr/lib64 (64-bit platform)
2) in/etc/lD. so. conf. d/create lzo in the directory. CONF file, write the path to the lzo library file, and then run/sbin/ldconfig-V to make the configuration take effect.
4. Install lzo on each node:
This statement can be included in one article, but I will list it as a separate step to remind you that lzo must be installed for both namenode and datanode!
Required software packages: gcc1_ant1_lzo-2.04.tar.gz, lzo-2.04-1.el5.rf.i386.rpm, lzo-devel-2.04-1.el5.rf.i386.rpm
Installation Process: omitted
Adjust the library file path: omitted
5. Installation of lzo encoding/decoder:
Note:If hadoop is a cloudera version, the lzo encoding/decoder should not use the official Google version!
Download https://github.com/kevinweil/hadoop-lzo. I was once depressed by the official one for a long time, searched for a lot of information, and finally figured it out.
wget https://download.github.com/kevinweil-hadoop-lzo-2ad6654.tar.gz tar -zxvf kevinweil-hadoop-lzo-2ad6654.tar.gz cd kevinweil-hadoop-lzo-2ad6654 ant compile-native tar
Compilation failed:
make: *** [impl/lzo/LzoCompressor.lo] Error 1
Solution reference: http://code.google.com/a/apache-extras.org/p/hadoop-gpl-compression/issues/detail? Id = 18 & redir = 1
I am here because of the lack of lzo-devel dependencies, lzo-devel has lzo-2.04-1. el5.rf dependencies:
wget http://packages.sw.be/lzo/lzo-devel-2.04-1.el5.rf.i386.rpm wget http://packages.sw.be/lzo/lzo-2.04-1.el5.rf.i386.rpm rpm -ivh lzo-2.04-1.el5.rf.i386.rpm rpm -ivh lzo-devel-2.04-1.el5.rf.i386.rpm
Recompile ant compile-native tar!
After compilation, you also need to copy the encoding/decoder and native Library to the $ hadoop_home/lib directory. For details about the copy operation, refer to the official Google documentation:
cp build/hadoop-lzo-0.4.10.jar /home/hadoop/hadoop-0.20.2-CDH3B4/lib/ tar -cBf - -C build/native . | tar -xBvf - -C home/hadoop/hadoop-0.20.2-CDH3B4/lib/native cd /home/hadoop/hadoop-0.20.2-CDH3B4/lib/ chown -R hdfs:hadoop native/
6. Synchronize the hadoop-lzo-0.4.10.jar of each node and the native directory of hadoop
For some reason, I copied them to the corresponding hbase directory, but I don't think it is necessary to copy them.
When I tested lzo in the cluster, I encountered a problem. I thought hbase also needed the native directory of hadoop-lzo-0.4.10.jar and hadoop and copied them to hbase. After the problem was solved, I found that it was not the cause of hbase, but I did not delete them in hbase. Therefore, whether it is necessary to copy them to hbase remains to be tested in person.
2. Configure lzo:
1. Add some properties to the core-site.xml and mapred-site.xml files in the conf directory under the hadoop directory:
VI core-site.xml:
<property> <name>io.compression.codecs</name> <value>org.apache.hadoop.io.compress.DefaultCodec,org.apache.hadoop.io.compress.GzipCodec,org.apache.hadoop.io.compress.BZip2Codec,com.hadoop.compression.lzo.LzopCodec</value> </property><property> <name>io.compression.codec.lzo.class</name> <value>com.hadoop.compression.lzo.LzoCodec</value> </property>
VI mapred-site.xml:
<property> <name>mapreduce.map.output.compress</name> <value>true</value> </property> <property> <name>mapred.child.env</name> <value>JAVA_LIBRARY_PATH=/home/hdfs/hadoop-0.20.2-CDH3B4/lib/native/Linux-amd64-64</value> </property> <property> <name>mapreduce.map.output.compress.codec</name> <value>com.hadoop.compression.lzo.LzoCodec</value></property>
2. Synchronize the configuration files of each node!
Iii. hadoop cluster lzo testing:
First, install lzop, generate some lzo files, and then upload them to HDFS for our developers to directly call in hive.
Lzo has been installed before and the path of the lzo library file has been adjusted. Now you only need to install lzop:
wget http://www.lzop.org/download/lzop-1.03.tar.gz tar -zxvf lzop-1.03 cd lzop-1.03 export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/ lib ./configure make && make install
It should be noted that when LD_LIBRARY_PATH is specified, I did not use the method in the official help document, because the method encountered an error during compilation and I do not know why.
lzop -U -9 66_22_2011-04-14.txt $HADOOP_HOME/bin/hadoop fs -copyFromLocal /home/hdfs/66_22_2011-04-14.txt.lzo /user/s3/ifocus
An error occurred while calling the lzo file directly in hive:
Failed with exception java.io.IOException:java.lang.RuntimeException: native-lzo library not available
This error has trapped me for two days! I tried a variety of methods, but I didn't have to wait until I saw a webpage: http://sudhirvn.blogspot.com/2010/08/hadoop-lzo-installation-errors-and.html. (Note: A proxy is required to access the webpage. Network Control !)
At the bottom of the page, there is such a sentence: So, I just deleted the hadoop-GPL-compression and everything started working. so I deleted the hadoop-gpl-compression-0.1.0.jar under the $ hadoop_home/lib directory and finally, everything is OK!
Therefore, if you do all the things you should do like me, but the native-lzo library not available error still occurs when you call lzo, check whether your $ hadoop_home/lib directory contains the official Google lzo encoding/decoder!
Original article: http://share.blog.51cto.com/278008/549393