Hive File Storage Format

Source: Internet
Author: User

The hive file storage format includes the following categories:

1.TEXTFILE

2.SEQUENCEFILE

3.RCFILE

4.ORCFILE (appears after 0.11)

Where textfile is the default format, the build table is not specified by default to this format, the import data will be directly copied to the HDFs data file is not processed. Sequencefile, Rcfile, rcfile tables cannot import data directly from a local file, the data is imported into a table in textfile format, and then imported into Sequencefile, Rcfile, Orcfile, and so on from the table.

Premise Creation Environment: hive0.8

Create a testfile_table table in the format textfile:

[Java]View PlainCopy
    1. CREATE TABLE if not exists testfile_table (
    2. Site String,
    3. URL string,
    4. PV bigint,
    5. Label string
    6. ) row format delimited fields terminated by ' \ t ' stored as textfile;


Load data:

[Java]View PlainCopy
    1. Load data local Inpath '/usr/local/src/weibo.txt ' overwrite into table testfile_table;


First, Textfile

The default format, data does not compress, disk overhead, data parsing overhead.

It can be used in conjunction with GZIP, BZIP2 (System Auto-check, automatic decompression during query execution), but in this way, hive does not slice the data and never can do parallel operations on the data.

Example:

[Java]View PlainCopy
  1. Build table
  2. CREATE TABLE if not exists textfile_table (
  3. Site String,
  4. URL string,
  5. PV bigint,
  6. Label string
  7. ) row format delimited fields terminated by ' \ t ' stored as textfile;
  8. Action before inserting data
  9. Hive> set hive.exec.compress.output=true;
  10. Hive> set mapred.output.compress=true;
  11. Hive> set Mapred.output.compression.codec=org.apache.hadoop.io.compress.gzipcodec;
  12. Hive> set Io.compression.codecs=org.apache.hadoop.io.compress.gzipcodec;
  13. Inserting data
  14. hive> Insert Overwrite table textfile_table select * from Testfile_table;



Second, Sequencefile

Sequencefile is a binary file support provided by the Hadoop API, which is easy to use, can be segmented, and compressible. Sequencefile supports three types of compression options: NONE, RECORD, BLOCK. The record compression rate is low, it is generally recommended to use block compression.

Example:

[Java]View PlainCopy
  1. Build table
  2. CREATE TABLE if not exists seqfile_table (
  3. Site String,
  4. URL string,
  5. PV bigint,
  6. Label string
  7. ) row format delimited fields terminated by ' \ t ' stored as sequencefile;
  8. Set related properties before inserting data
  9. Hive> set hive.exec.compress.output=true;
  10. Hive> set Mapred.output.compression.type=block;
  11. Inserting data
  12. Insert Overwrite table seqfile_table select * from Textfile_table;



Third, Rcfile

Rcfile is a combination of row and column storage. First, it blocks the data in rows, guaranteeing that the same record is on a block, avoiding reading a single record to read multiple block blocks. Second, block data columnstore, which facilitates data compression and fast column access.

Example:

[Java]View PlainCopy
    1. Build table
    2. CREATE TABLE if not exists rcfile_table (
    3. Site String,
    4. URL string,
    5. PV bigint,
    6. Label string
    7. ) row format delimited fields terminated by ' \ t ' stored as rcfile;
    8. Insert Data operation:
    9. Set hive.exec.compress.output=true;
    10. Set mapred.output.compress=true;
    11. Inserting data
    12. Insert Overwrite table rcfile_table select * from Testfile_table;

Hive File Storage Format

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.