Write Custom Java to Create LZO Files

Source: Internet
Author: User
Tags table definition

Https://cwiki.apache.org/confluence/display/Hive/LanguageManual+LZO

Languagemanual LZO Skip to end of metadata
    • Created by Lefty Leverenz, last modified on Sep
Go to start of metadata LZO Compression

  • LZO Compression
      • General LZO Concepts
      • Prerequisites
        • LZO/LZOP installations
        • Core-site.xml
        • Table Definition
      • Hive Queries
        • Option 1:directly Create LZO Files
        • Option 2:write Custom Java to Create LZO Files

General LZO Concepts

LZO is a lossless the data compression library that favors speed over compression ratio. See Http://www.oberhumer.com/opensource/lzo and http://www.lzop.org for general information on Lzo and see compressed D ATA Storage for information on compression in Hive.

Imagine a simple data file, has three columns

    • Id
    • First Name
    • Last Name

Let ' s populate a data file containing 4 records:

19630001     John          lennon19630002     Paul          mccartney19630003     George        harrison19630004     Ringo         Starr

Let's call the data file /path/to/dir/names.txt .

In order to make it into an LZO file, we can use the Lzop utility and it would create a names.txt.lzo file.

Now copy the file to names.txt.lzo HDFS.

PREREQUISITESLZO/LZOP installations

lzoand lzop need to being installed on every node in the Hadoop cluster. The details of these installations is beyond the scope of this document.

core-site.xml

Add the following to your core-site.xml :

    • com.hadoop.compression.lzo.LzoCodec
    • com.hadoop.compression.lzo.LzopCodec

For example:

<property>
<name>io.compression.codecs</name>
<value>org.apache.hadoop.io.compress.GzipCodec,org.apache.hadoop.io.compress.DefaultCodec,org.apache.hadoop.io.compress.BZip2Codec,Com.hadoop.compression.lzo.lzocodec,com.hadoop.compression.lzo.lzopcodec</value>
</property>

<property>
<name>io.compression.codec.lzo.class</name>
<value>Com.hadoop.compression.lzo.LzoCodec</value>
</property>

Next we run the command to create an LZO index file:

Hadoop Jar/path/to/jar/hadoop-lzo-cdh4-0.4.15-gplextras.jar com.hadoop.compression.lzo.LzoIndexer  /path/to/ Hdfs/dir/containing/lzo/files

This is creates on names.txt.lzo HDFS.

Table Definition

The following hive -e command creates an lzo-compressed external table:

Hive-e "CREATE EXTERNAL TABLE IF not EXISTS hive_table_name (column_1  datatype_1......column_n datatype_n)         Partitioned by (Partition_col_1 datatype_1 .... col_p  datatype_p)         ROW FORMAT delimited fields TERMINATED by ' \ t ' c4/>stored as InputFormat  \ "com.hadoop.mapred.deprecatedlzotextinputformat\"                   outputformat \ " Org.apache.hadoop.hive.ql.io.hiveignorekeytextoutputformat\ ";

Note:the double quotes has the to is escaped so, the " hive -e command works correctly.

See the CREATE TABLE and Hive CLI for information about command syntax.

Hive queriesoption 1:directly Create LZO Files
    1. Directly Create LZO files as the output of the Hive query.
    2. Use lzop command utility or your custom Java to generate for the .lzo.index .lzo files.

Hive Query Parameters

SET Mapreduce.output.fileoutputformat.compress.codec=com.hadoop.compression.lzo.lzocodecset Hive.exec.compress.output=trueset Mapreduce.output.fileoutputformat.compress=true

For example:

Hive-e "SET Mapreduce.output.fileoutputformat.compress.codec=com.hadoop.compression.lzo.lzocodec; SET hive.exec.compress.output=true; SET mapreduce.output.fileoutputformat.compress=true; <query-string> "

note:if The data sets is large or number of output files is large, then this option does not work.

Option 2:write Custom Java to Create LZO Files
    1. Create text files as the output of the Hive query.
    2. Write Custom Java code to
      1. Convert Hive query generated text files to .lzo files
      2. Generate .lzo.index files for the .lzo files generated above

Hive Query Parameters

Prefix the query string with these parameters:

SET Hive.exec.compress.output=falseset Mapreduce.output.fileoutputformat.compress=false

For example:

Hive-e "SET Hive.exec.compress.output=false; SET mapreduce.output.fileoutputformat.compress=false;<query-string> "

Write Custom Java to Create LZO Files

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.