About Hadoop using Lzo compression mode

Last Update:2018-07-25 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

In recent days to verify the next Lzo This compression mode, has the following feeling:

Recently Lzo use problem, found Java.library.path setup problem, many online write is in hadoop-env.sh file add Java_library_path this attribute (about another increase Hadoop_classpath is valid , it is true that the jar package under the Lib directory is not automatically loaded when this hadoop-0.20.205.0 version does start.

And I set this parameter is really no effect, I later wrote a get print out of the Java.library.path, the results of the operation are as follows:

[Hadoop@master hive_testdata]$ Java test_loadlibrary
/home/hadoop/jrockit-jdk1.6.0_29/jre/lib/amd64/jrockit:/home/hadoop/jrockit-jdk1.6.0_29/jre/lib/amd64:/home/ Hadoop/jrockit-jdk1.6.0_29/jre/.. /lib/amd64::/usr/local/lib
Load success
Libgplcompression.so

The first line is the value of my current system getting Java.library.path

The second line is I put the relevant local library under the/usr/local/lib after the load succeeded

The third line is to test what the return value of this method is system.maplibraryname under Liunx

Test_loadlibrary.java source code is as follows:

Import java.util.Properties;
Import Java.util.Set;

Public class Test_loadlibrary
{
public static void main (string[] args)
{
   Pro Perties props = System.getproperties ();
   set<object> keys = Props.keyset ();
      for (Object key:keys) {
    if ((String) key). Equals (" Java.library.path "))
     System.out.println (System.getproperty (" Java.library.path "));
       if (((String) key). Equals (".")
     System.out.println (System.getproperty ("."));

}
try{
System.loadlibrary ("Gplcompression");
System.out.println ("Load success");
}catch (Throwable t) {
System.out.println ("Error");
T.printstacktrace ();
}
System.out.println (System.maplibraryname ("gplcompression"));
}
}

When I put the relevant local library under one of the directories (/usr/local/lib), and then run or hold the Gplcompression local library is not found.

Later, there is no way to modify the Hadoop-lzo Gplnativecodeloader code, adding a print Java.library.path path value, the result is as follows:

12/05/08 14:45:39 INFO Lzo. Gplnativecodeloader:/home/hadoop/hadoop-0.20.205.0/libexec/. /lib
12/05/08 14:45:39 INFO Lzo. gplnativecodeloader:loaded Native GPL Library

Sure enough, Java.library.path is in the $hadoop_home/lib directory, then I put the relevant local libraries are placed under the $hadoop_home/lib, the operation finally succeeded.

At the moment I don't know why, maybe it's the JRockit JVM I used (guess).

The test lzo found that there is a lot of performance improvement.

But one problem is that if you use the streamming (non-Java language) approach, it's a hassle. Now that I'm using the lzo pattern, I've found that many table structures are hive to be modified, and querying single-column data is not normal.

For example: Count is a null value, get a random column of data if the characters are garbled and many other problems arise.

In addition there are also online to introduce hive read Lzo way, URL: http://www.mrbalky.com/2011/02/24/hive-tables-partitions-and-lzo-compression/

But this is the way to generate files in HDFs with suffixes, as shown in the following figure:

The other is the use of Lzo, the resulting file on one (if the file capacity is not good enough, if too big on a file is very frustrating AH).

But when read in the hive, but no data are read out, the following figure:

The current conclusion is to hive in the production environment How to compress the mode (the default format), which can make it difficult to deal with some tools (especially in the streamming mode, which may require the Lzo decompression method).

There are a lot of related problems to be continued to specialize and conquer ...

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

About Hadoop using Lzo compression mode

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

About Hadoop using Lzo compression mode

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support