Hive file Format
1, Textfile
Default file format
Data does not compress, disk overhead, data parsing overhead, can be combined with gzip, BZIP2 use (System Auto-detection, automatic decompression when executing queries)
Data is not segmented by hive, so data cannot be manipulated in parallel
To create a command:
2, Sequencefile
is a binary file support provided by the Hadoop API
Easy to use, divisible, compressible features
Supports three compression methods: None\record (Low compression) \block (recommended)
To create a command:
The second red box is to set the compression method, Sequencefile and Rcfile format cannot import data directly from the local file, the data must first be imported into the Textfile format table, Then import into the Sequencefile and Rcfile tables with insert from the Textfile table
3, Rcfile
Facebook development, row-and-column storage combined storage method
Higher compression ratios
Read columns faster
The Rcfile storage structure follows the design of "horizontal division first, then vertical division",
First, rcfile guarantees that the same row of data is on the same node, and secondly, like Columnstore, Rcfile can take advantage of the data compression of the column, and be able to skip unnecessary column reads
To create a command:
4. Custom
When the user's data file format cannot be used by the current hive, the file format can be customized by implementing InputFormat and OutputFormat custom input and output formats
To create a command:
5. Comparison of hive three file formats
File formats for hive-4-hive