About Hadoop using Lzo compression mode

Source: Internet
Author: User

In recent days to verify the next Lzo This compression mode, has the following feeling:

Recently Lzo use problem, found Java.library.path setup problem, many online write is in hadoop-env.sh file add Java_library_path this attribute (about another increase Hadoop_classpath is valid , it is true that the jar package under the Lib directory is not automatically loaded when this hadoop-0.20.205.0 version does start.

And I set this parameter is really no effect, I later wrote a get print out of the Java.library.path, the results of the operation are as follows:

[Hadoop@master hive_testdata]$ Java test_loadlibrary
/home/hadoop/jrockit-jdk1.6.0_29/jre/lib/amd64/jrockit:/home/hadoop/jrockit-jdk1.6.0_29/jre/lib/amd64:/home/ Hadoop/jrockit-jdk1.6.0_29/jre/.. /lib/amd64::/usr/local/lib
Load success
Libgplcompression.so

The first line is the value of my current system getting Java.library.path

The second line is I put the relevant local library under the/usr/local/lib after the load succeeded

The third line is to test what the return value of this method is system.maplibraryname under Liunx

Test_loadlibrary.java source code is as follows:

Import java.util.Properties;
Import Java.util.Set;

Public class Test_loadlibrary
{
  public static void main (string[] args)
  {
   Pro Perties props = System.getproperties ();
   set<object> keys = Props.keyset ();
      for (Object key:keys) {
    if ((String) key). Equals (" Java.library.path "))
     System.out.println (System.getproperty (" Java.library.path "));
       if (((String) key). Equals (".")
     System.out.println (System.getproperty ("."));

}
try{
System.loadlibrary ("Gplcompression");
System.out.println ("Load success");
}catch (Throwable t) {
System.out.println ("Error");
T.printstacktrace ();
}
System.out.println (System.maplibraryname ("gplcompression"));
}
}

When I put the relevant local library under one of the directories (/usr/local/lib), and then run or hold the Gplcompression local library is not found.

Later, there is no way to modify the Hadoop-lzo Gplnativecodeloader code, adding a print Java.library.path path value, the result is as follows:

12/05/08 14:45:39 INFO Lzo. Gplnativecodeloader:/home/hadoop/hadoop-0.20.205.0/libexec/. /lib
12/05/08 14:45:39 INFO Lzo. gplnativecodeloader:loaded Native GPL Library

Sure enough, Java.library.path is in the $hadoop_home/lib directory, then I put the relevant local libraries are placed under the $hadoop_home/lib, the operation finally succeeded.

At the moment I don't know why, maybe it's the JRockit JVM I used (guess).

The test lzo found that there is a lot of performance improvement.

But one problem is that if you use the streamming (non-Java language) approach, it's a hassle. Now that I'm using the lzo pattern, I've found that many table structures are hive to be modified, and querying single-column data is not normal.

For example: Count is a null value, get a random column of data if the characters are garbled and many other problems arise.

In addition there are also online to introduce hive read Lzo way, URL: http://www.mrbalky.com/2011/02/24/hive-tables-partitions-and-lzo-compression/

But this is the way to generate files in HDFs with suffixes, as shown in the following figure:

The other is the use of Lzo, the resulting file on one (if the file capacity is not good enough, if too big on a file is very frustrating AH).

But when read in the hive, but no data are read out, the following figure:

The current conclusion is to hive in the production environment How to compress the mode (the default format), which can make it difficult to deal with some tools (especially in the streamming mode, which may require the Lzo decompression method).

There are a lot of related problems to be continued to specialize and conquer ...

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.