Hive Learning Path (vi) data type and storage format for hive SQL

Last Update:2018-04-07 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

I. Data type 1, basic data type

Hive supports most basic data types in relational data

tr>

type	Description	Example
boolean	true/false	true
tinyint	1-byte signed integer	-128~127 1Y
smallint	2-byte signed integer, -32768~32767	1 s
int	4-byte signed integer	1
bigint	8-byte signed integer	1 l
float	4-byte single-precision floating-point number	1.0
double	8-byte double-precision floating-point number	1.0
Deicimal	signed decimals with arbitrary precision	1.0
string	string, variable length	"A", ' B '
varchar	variable-length word Character string	"A", ' B '
char	fixed-length string	"A", ' B '
binary	byte array	cannot represent
timestamp	timestamp, nanosecond precision	122327493795
Date	date	' 2018-04-07 '

As with other SQL languages, these are reserved words. It is important to note that all of these data types are implementations of the interfaces in Java, so the specific behavior details of these types are exactly the same as the corresponding types in Java. For example, the string type implements a String,float in Java that implements float in Java, and so on.

2. Complex Type

type	Description	Example
Array	An orderly set of similar types.	Array (from)
Map	Key-value,key must be of the original type, value can be any type	Map (' A ', 1, ' B ', 2)
struct	Field collection, type can be different	struct (' 1 ', 1,1.0), Named_stract (' col1 ', ' 1 ', ' col2 ', 1, ' ClO3 ', 1.0)

Second, storage format

Hive creates a directory on HDFS for each database that is created, and the table is stored as a subdirectory, and the data in the table is stored as a file in the table directory. The default database does not have its own directory, and the default database table is stored in the/user/hive/warehouse directory.

(1) Textfile

Textfile is the default format and is stored as a row store. Data is not compressed, disk overhead is large, data parsing cost is large.

(2) Sequencefile

Sequencefile is a binary file support provided by the Hadoop API, which is easy to use, can be segmented, and compressible.

Sequencefile supports three types of compression options: NONE, RECORD, BLOCK. The record compression rate is low, it is generally recommended to use block compression.

(3) Rcfile

A combination of row and column storage methods.

(4) Orcfile

Data is divided by row, and each block is stored in columns, where each block is stored with an index. The new format given by Hive, which belongs to the upgraded version of Rcfile, has a significant improvement in performance, and the data can be compressed, compressed and quickly accessed.

(5) Parquet

Parquet is also a row-type store with good compression performance while reducing the amount of time it takes to scan and deserialize a large number of tables.

Third, the data format

When the data is stored in a text file, the rows and columns must be distinguished by a certain format, and the delimiters are indicated in hive. Hive uses a number of characters that are seldom present by default, and these characters generally do not appear as content in the record.

The default row and column separators for Hive are shown in the following table.

Separators	Description
\ n	For a text file, each line is a record, so \ n to split the record
^a (Ctrl + a)	You can also use \001 to represent a field
^b (CTRL+B)	Used to split elements in a arrary or Struct, or to divide between key values in a map, or to split them with \002.
^c	Used in the map to split the key and the value itself, can also be expressed in \003.

Hive Learning Path (vi) data type and storage format for hive SQL

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Hive Learning Path (vi) data type and storage format for hive SQL

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Hive Learning Path (vi) data type and storage format for hive SQL

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support