In-depth hadoop Research: (8) -- Codec

Last Update:2018-12-03 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Reprinted please indicate the source: http://blog.csdn.net/lastsweetop/article/details/9173061

All source code on GitHub, https://github.com/lastsweetop/styhadoop

Introduction codec is actually a acronyms made up of the word headers of coder and decoder. Compressioncodec defines the compression and decompression interfaces. Here we talk about codec, which implements some compression format classes for the compressioncodec interface. Below is a list of these classes:

Compressioncodes can be used to decompress compressioncodec in two ways. Compression: Use the createoutputstream (outputstream out) method to obtain the compressionoutputstream object. decompress the compressioninputstream object. Use the createinputstream (inputstream in) method to obtain the compressioninputstream object compression sample code.

Package COM. sweetop. styhadoop; import Org. apache. hadoop. conf. configuration; import Org. apache. hadoop. io. ioutils; import Org. apache. hadoop. io. compress. compressioncodec; import Org. apache. hadoop. io. compress. compressionoutputstream; import Org. apache. hadoop. util. reflectionutils;/*** created with intellij idea. * User: lastsweetop * Date: 13-6-25 * Time: * to change this template use file | setting S | file templates. */public class streamcompressor {public static void main (string [] ARGs) throws exception {string codecclassname = ARGs [0]; Class <?> Codecclass = Class. forname (codecclassname); configuration conf = new configuration (); compressioncodec codec = (compressioncodec) reflectionutils. newinstance (codecclass, conf); compressionoutputstream out = codec. createoutputstream (system. out); ioutils. copybytes (system. in, out, 4096, false); out. finish ();}}

Accept a parameter of the compressioncodec implementation class from the command line, instantiate the class through reflectionutils, call the compressioncodec interface method to encapsulate the standard output stream, and encapsulate it into a compressed stream, copy the standard input stream to the compressed stream using the copybytes method of the ioutils class, and call the finish method of compressioncodec to complete compression. Let's take a look at the command line:

echo "Hello lastsweetop" | ~/hadoop/bin/hadoop com.sweetop.styhadoop.StreamCompressor  org.apache.hadoop.io.compress.GzipCodec | gunzip -

Use the gzipcodec class to compress "Hello lastsweetop" and decompress it using the gunzip tool.

Let's take a look at the output:

 [exec] 13/06/26 20:01:53 INFO util.NativeCodeLoader: Loaded the native-hadoop library     [exec] 13/06/26 20:01:53 INFO zlib.ZlibFactory: Successfully loaded & initialized native-zlib library     [exec] Hello lastsweetop

Use compressioncodecfactory to decompress the file. If you want to read a compressed file, you must first use the extension to determine which CODEC to use. You can refer to hadoop's in-depth research: (7) -- compress the corresponding relationship. Of course, there is a simpler way. compressioncodecfactory has already done it for you. You can get the corresponding Codec by passing in a path to call its getcodec method. Let's take a look at the code

Package COM. sweetop. styhadoop; import Org. apache. hadoop. conf. configuration; import Org. apache. hadoop. FS. filesystem; import Org. apache. hadoop. FS. path; import Org. apache. hadoop. io. ioutils; import Org. apache. hadoop. io. compress. compressioncodec; import Org. apache. hadoop. io. compress. compressioncodecfactory; import Java. io. ioexception; import Java. io. inputstream; import Java. io. outputstream; import java.net. uri;/*** created with intellij idea. * User: lastsweetop * Date: 13-6-26 * Time: * to change this template use file | Settings | file templates. */public class filedecompressor {public static void main (string [] ARGs) throws exception {string uri = ARGs [0]; configuration conf = new configuration (); filesystem FS = filesystem. get (URI. create (URI), conf); Path inputpath = New Path (URI); compressioncodecfactory factory = new compressioncodecfactory (CONF); compressioncodec codec = factory. getcodec (inputpath); If (codec = NULL) {system. out. println ("No codec found for" + URI); system. exit (1);} string outputuri = compressioncodecfactory. removesuffix (Uri, codec. getdefaultextension (); inputstream in = NULL; outputstream out = NULL; try {In = codec. createinputstream (FS. open (inputpath); out = FS. create (New Path (outputuri); ioutils. copybytes (In, out, conf);} finally {ioutils. closestream (in); ioutils. closestream (out );}}}

Pay attention to the removesuffix method. This is a static method. It can remove the file suffix and use this path as the output path for decompression. The codec that compressioncodecfactory can find is also limited. By default, there are only three types of Org. apache. hadoop. io. compress. gzipcodec, org. apache. hadoop. io. compress. bzip2codec, org. apache. hadoop. io. compress. defaultcodec, if you want to add other codec, you need to change Io. compression. codecs attribute and register codec. The concept of native libraries is growing more and more, and HDFS codec is no exception. Native libraries can greatly improve the performance, such as gzip native library decompression by 50% and compression by 10%, however, not all codec instances have native libraries, while some codec instances only have native libraries. Let's take a look at the following list: in Linux, hadoop has compiled 32-bit native libraries and 64-bit native libraries in advance. Let's take a look:

[hadoop@namenode native]$pwd/home/hadoop/hadoop/lib/native[hadoop@namenode native]$ls -lstotal 84 drwxrwxrwx 2 root root 4096 Nov 14  2012 Linux-amd64-644 drwxrwxrwx 2 root root 4096 Nov 14  2012 Linux-i386-32

If it is another platform, you need to compile it by yourself. For detailed steps, refer to compile here:

if [ -d "${HADOOP_HOME}/build/native" -o -d "${HADOOP_HOME}/lib/native" -o -e "${HADOOP_PREFIX}/lib/libhadoop.a" ]; then  if [ -d "$HADOOP_HOME/build/native" ]; then    JAVA_LIBRARY_PATH=${HADOOP_HOME}/build/native/${JAVA_PLATFORM}/lib  fi  if [ -d "${HADOOP_HOME}/lib/native" ]; then    if [ "x$JAVA_LIBRARY_PATH" != "x" ]; then      JAVA_LIBRARY_PATH=${JAVA_LIBRARY_PATH}:${HADOOP_HOME}/lib/native/${JAVA_PLATFORM}    else      JAVA_LIBRARY_PATH=${HADOOP_HOME}/lib/native/${JAVA_PLATFORM}    fi  fi  if [ -e "${HADOOP_PREFIX}/lib/libhadoop.a" ]; then    JAVA_LIBRARY_PATH=${HADOOP_PREFIX}/lib  fifi

Hadoop will find the corresponding native library and automatically load it. You do not need to care about these settings. But sometimes you don't want to use the native library. For example, when debugging some bugs, you can use hadoop. Native. lib to set it to false. If you use the native library for a lot of compression and decompression, you can consider using codecpool, a bit like a late connection, so that you do not need to frequently create codec objects.

Package COM. sweetop. styhadoop; import Org. apache. hadoop. conf. configuration; import Org. apache. hadoop. io. ioutils; import Org. apache. hadoop. io. compress. codecpool; import Org. apache. hadoop. io. compress. compressioncodec; import Org. apache. hadoop. io. compress. compressionoutputstream; import Org. apache. hadoop. io. compress. compressor; import Org. apache. hadoop. util. reflectionutils;/*** created with intellij idea. * User: lastsweetop * Date: 13-6-27 * Time: am * to change this template use file | Settings | file templates. */public class pooledstreamcompressor {public static void main (string [] ARGs) throws exception {string codecclassname = ARGs [0]; Class <?> Codecclass = Class. forname (codecclassname); configuration conf = new configuration (); compressioncodec codec = (compressioncodec) reflectionutils. newinstance (codecclass, conf); compressor = NULL; try {compressor = codecpool. getcompressor (codec); compressionoutputstream out = codec. createoutputstream (system. out, compressor); ioutils. copybytes (system. in, out, 4096, false); out. finish ();} finally {codecpool. returncompressor (compressor );}}}

The code is easy to understand. You can use the getcompressor method of codecpool to obtain the compressor object. This method requires passing in a codec, and then the compressor object is used in createoutputstream. after use, the compressor object is returncompressor. The output result is as follows:

 [exec] 13/06/27 12:00:06 INFO util.NativeCodeLoader: Loaded the native-hadoop library     [exec] 13/06/27 12:00:06 INFO zlib.ZlibFactory: Successfully loaded & initialized native-zlib library     [exec] 13/06/27 12:00:06 INFO compress.CodecPool: Got brand-new compressor     [exec] Hello lastsweetop

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

In-depth hadoop Research: (8) -- Codec

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

In-depth hadoop Research: (8) -- Codec

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support