File formats for hive-4-hive

Source: Internet
Author: User

Hive file Format

1, Textfile

Default file format

Data does not compress, disk overhead, data parsing overhead, can be combined with gzip, BZIP2 use (System Auto-detection, automatic decompression when executing queries)

Data is not segmented by hive, so data cannot be manipulated in parallel

To create a command:

2, Sequencefile

is a binary file support provided by the Hadoop API

Easy to use, divisible, compressible features

Supports three compression methods: None\record (Low compression) \block (recommended)

To create a command:


The second red box is to set the compression method, Sequencefile and Rcfile format cannot import data directly from the local file, the data must first be imported into the Textfile format table, Then import into the Sequencefile and Rcfile tables with insert from the Textfile table

3, Rcfile

Facebook development, row-and-column storage combined storage method

Higher compression ratios

Read columns faster

The Rcfile storage structure follows the design of "horizontal division first, then vertical division",


First, rcfile guarantees that the same row of data is on the same node, and secondly, like Columnstore, Rcfile can take advantage of the data compression of the column, and be able to skip unnecessary column reads

To create a command:

4. Custom

When the user's data file format cannot be used by the current hive, the file format can be customized by implementing InputFormat and OutputFormat custom input and output formats

To create a command:

5. Comparison of hive three file formats

File formats for hive-4-hive

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.