In recent days to verify the next Lzo This compression mode, has the following feeling:
Recently Lzo use problem, found Java.library.path setup problem, many online write is in hadoop-env.sh file add Java_library_path this attribute (about another increase Hadoop_classpath is valid , it is true that the jar package under the Lib directory is not automatically loaded when this hadoop-0.20.205.0 version does start.
And I set this parameter is really no effect, I later wrote a get print out of the Java.library.path, the results of the operation are as follows:
[Hadoop@master hive_testdata]$ Java test_loadlibrary
/home/hadoop/jrockit-jdk1.6.0_29/jre/lib/amd64/jrockit:/home/hadoop/jrockit-jdk1.6.0_29/jre/lib/amd64:/home/ Hadoop/jrockit-jdk1.6.0_29/jre/.. /lib/amd64::/usr/local/lib
Load success
Libgplcompression.so
The first line is the value of my current system getting Java.library.path
The second line is I put the relevant local library under the/usr/local/lib after the load succeeded
The third line is to test what the return value of this method is system.maplibraryname under Liunx
Test_loadlibrary.java source code is as follows:
Import java.util.Properties;
Import Java.util.Set;
Public class Test_loadlibrary
{
public static void main (string[] args)
{
Pro Perties props = System.getproperties ();
set<object> keys = Props.keyset ();
for (Object key:keys) {
if ((String) key). Equals (" Java.library.path "))
System.out.println (System.getproperty (" Java.library.path "));
if (((String) key). Equals (".")
System.out.println (System.getproperty ("."));
}
try{
System.loadlibrary ("Gplcompression");
System.out.println ("Load success");
}catch (Throwable t) {
System.out.println ("Error");
T.printstacktrace ();
}
System.out.println (System.maplibraryname ("gplcompression"));
}
}
When I put the relevant local library under one of the directories (/usr/local/lib), and then run or hold the Gplcompression local library is not found.
Later, there is no way to modify the Hadoop-lzo Gplnativecodeloader code, adding a print Java.library.path path value, the result is as follows:
12/05/08 14:45:39 INFO Lzo. Gplnativecodeloader:/home/hadoop/hadoop-0.20.205.0/libexec/. /lib
12/05/08 14:45:39 INFO Lzo. gplnativecodeloader:loaded Native GPL Library
Sure enough, Java.library.path is in the $hadoop_home/lib directory, then I put the relevant local libraries are placed under the $hadoop_home/lib, the operation finally succeeded.
At the moment I don't know why, maybe it's the JRockit JVM I used (guess).
The test lzo found that there is a lot of performance improvement.
But one problem is that if you use the streamming (non-Java language) approach, it's a hassle. Now that I'm using the lzo pattern, I've found that many table structures are hive to be modified, and querying single-column data is not normal.
For example: Count is a null value, get a random column of data if the characters are garbled and many other problems arise.
In addition there are also online to introduce hive read Lzo way, URL: http://www.mrbalky.com/2011/02/24/hive-tables-partitions-and-lzo-compression/
But this is the way to generate files in HDFs with suffixes, as shown in the following figure:
The other is the use of Lzo, the resulting file on one (if the file capacity is not good enough, if too big on a file is very frustrating AH).
But when read in the hive, but no data are read out, the following figure:
The current conclusion is to hive in the production environment How to compress the mode (the default format), which can make it difficult to deal with some tools (especially in the streamming mode, which may require the Lzo decompression method).
There are a lot of related problems to be continued to specialize and conquer ...