Install and configure lzo in a hadoop Cluster

Source: Internet
Author: User
Tags hadoop fs

Lzo compression can be performed in parallel in multiple parts, and the decompression efficiency is also acceptable.

 

To cooperate with the Department's hadoop platform for testing, the author details how to install the required software packages for lzo on the hadoop platform: GCC, ant, lzo, lzo encoding/decoder, and configure lzo files: core-site.xml, mapred-site.xml. Hope to help you. The following is the text:

Recently, our department has been testing the cloud computing platform hadoop. I 've been suffering from lzo three or four days of hard work. Here, we will give you a reference.

Operating System: Centos 5.5, hadoop version: hadoop-0.20.2-CDH3B4

Software packages required for lzo Installation: GCC, ant, lzo, and lzo encoding/decoder. In addition, lzo-devel dependency is required.

Configure the lzo File: Core-site.xml, mapred-site.xml

General steps:

1) install and update GCC and ant

2) install lzo on each node

3) install lzo encoding/Decoder

4) modify the configuration file and synchronize the configuration files of each node

Note: Unless otherwise specified, all operations are performed in namenode.

1. lzo installation:

1. Install GCC: Yum. Remember to update lib *, glibc *, and GCC * at the same time *

2. Ant installation:

Delete old version: Yum remove ant

Install the new version:

wget http://mirror.bjtu.edu.cn/apache//ant/binaries/apache-ant-1.8.2-bin.tar.bz2 tar -jxvf apache-ant-1.8.2-bin.tar.bz2

Add the ant environment variable:

vi /etc/profile export ANT_HOME=/usr/local/apache-ant-1.8.2 export PATH=$PATH:$ANT_HOME/bin source /etc/profile

3. lzo installation:

wget http://www.oberhumer.com/opensource/lzo/download/lzo-2.04.tar.gz tar -zxvf lzo-2.04.tar.gz ./configure --enable-shared make && make install

The library file is installed to/usr/local/lib by default. We need to specify the path of the lzo library file. Both methods can be used:

1) copy the lzo library file under the/usr/local/lib directory to/usr/lib (32-bit platform) or/usr/lib64 (64-bit platform)

2) in/etc/lD. so. conf. d/create lzo in the directory. CONF file, write the path to the lzo library file, and then run/sbin/ldconfig-V to make the configuration take effect.

4. Install lzo on each node:

This statement can be included in one article, but I will list it as a separate step to remind you that lzo must be installed for both namenode and datanode!

Required software packages: gcc1_ant1_lzo-2.04.tar.gz, lzo-2.04-1.el5.rf.i386.rpm, lzo-devel-2.04-1.el5.rf.i386.rpm

Installation Process: omitted

Adjust the library file path: omitted

5. Installation of lzo encoding/decoder:

Note:If hadoop is a cloudera version, the lzo encoding/decoder should not use the official Google version!

Download https://github.com/kevinweil/hadoop-lzo. I was once depressed by the official one for a long time, searched for a lot of information, and finally figured it out.

wget https://download.github.com/kevinweil-hadoop-lzo-2ad6654.tar.gz tar -zxvf kevinweil-hadoop-lzo-2ad6654.tar.gz cd kevinweil-hadoop-lzo-2ad6654 ant compile-native tar

Compilation failed:

make: *** [impl/lzo/LzoCompressor.lo] Error 1

Solution reference: http://code.google.com/a/apache-extras.org/p/hadoop-gpl-compression/issues/detail? Id = 18 & redir = 1

I am here because of the lack of lzo-devel dependencies, lzo-devel has lzo-2.04-1. el5.rf dependencies:

wget http://packages.sw.be/lzo/lzo-devel-2.04-1.el5.rf.i386.rpm wget http://packages.sw.be/lzo/lzo-2.04-1.el5.rf.i386.rpm rpm -ivh lzo-2.04-1.el5.rf.i386.rpm rpm -ivh lzo-devel-2.04-1.el5.rf.i386.rpm

Recompile ant compile-native tar!

After compilation, you also need to copy the encoding/decoder and native Library to the $ hadoop_home/lib directory. For details about the copy operation, refer to the official Google documentation:

cp build/hadoop-lzo-0.4.10.jar /home/hadoop/hadoop-0.20.2-CDH3B4/lib/ tar -cBf - -C build/native . | tar -xBvf - -C home/hadoop/hadoop-0.20.2-CDH3B4/lib/native cd /home/hadoop/hadoop-0.20.2-CDH3B4/lib/ chown -R hdfs:hadoop native/

6. Synchronize the hadoop-lzo-0.4.10.jar of each node and the native directory of hadoop

For some reason, I copied them to the corresponding hbase directory, but I don't think it is necessary to copy them.

When I tested lzo in the cluster, I encountered a problem. I thought hbase also needed the native directory of hadoop-lzo-0.4.10.jar and hadoop and copied them to hbase. After the problem was solved, I found that it was not the cause of hbase, but I did not delete them in hbase. Therefore, whether it is necessary to copy them to hbase remains to be tested in person.

 

2. Configure lzo:

1. Add some properties to the core-site.xml and mapred-site.xml files in the conf directory under the hadoop directory:

VI core-site.xml:

 

<property> <name>io.compression.codecs</name>  <value>org.apache.hadoop.io.compress.DefaultCodec,org.apache.hadoop.io.compress.GzipCodec,org.apache.hadoop.io.compress.BZip2Codec,com.hadoop.compression.lzo.LzopCodec</value> </property><property>  <name>io.compression.codec.lzo.class</name>  <value>com.hadoop.compression.lzo.LzoCodec</value>  </property>

 

VI mapred-site.xml:

 

<property>  <name>mapreduce.map.output.compress</name>  <value>true</value>  </property> <property>  <name>mapred.child.env</name>  <value>JAVA_LIBRARY_PATH=/home/hdfs/hadoop-0.20.2-CDH3B4/lib/native/Linux-amd64-64</value>  </property> <property>  <name>mapreduce.map.output.compress.codec</name>  <value>com.hadoop.compression.lzo.LzoCodec</value></property>

2. Synchronize the configuration files of each node!

Iii. hadoop cluster lzo testing:

First, install lzop, generate some lzo files, and then upload them to HDFS for our developers to directly call in hive.

Lzo has been installed before and the path of the lzo library file has been adjusted. Now you only need to install lzop:

wget http://www.lzop.org/download/lzop-1.03.tar.gz tar -zxvf lzop-1.03 cd lzop-1.03 export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/ lib ./configure make && make install

It should be noted that when LD_LIBRARY_PATH is specified, I did not use the method in the official help document, because the method encountered an error during compilation and I do not know why.

lzop -U -9 66_22_2011-04-14.txt  $HADOOP_HOME/bin/hadoop fs -copyFromLocal /home/hdfs/66_22_2011-04-14.txt.lzo /user/s3/ifocus 
An error occurred while calling the lzo file directly in hive:
Failed with exception java.io.IOException:java.lang.RuntimeException: native-lzo library not available

This error has trapped me for two days! I tried a variety of methods, but I didn't have to wait until I saw a webpage: http://sudhirvn.blogspot.com/2010/08/hadoop-lzo-installation-errors-and.html. (Note: A proxy is required to access the webpage. Network Control !)

At the bottom of the page, there is such a sentence: So, I just deleted the hadoop-GPL-compression and everything started working. so I deleted the hadoop-gpl-compression-0.1.0.jar under the $ hadoop_home/lib directory and finally, everything is OK!

Therefore, if you do all the things you should do like me, but the native-lzo library not available error still occurs when you call lzo, check whether your $ hadoop_home/lib directory contains the official Google lzo encoding/decoder!

Original article: http://share.blog.51cto.com/278008/549393

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.