hive Data Model and storage
In the last article, I've enumerated a simple example of hive operations, created a table test, and loaded the data into this table, which are similar to relational database operations, and we often compare hive with relational databases, precisely because hive many knowledge points and relational databases are similar.
There are tables, partitions, and hive in relational databases, which are called hive data models in hive technology. Today, this article describes the hive data types, data models, and file storage formats. This knowledge can be analogous to relational database knowledge.
First I want to talk about the data type of hive.
Hive supports two types of data, one called atomic data types, and one called complex data types.
The atomic data types include numeric, Boolean, and string types, as shown in the following table:
Basic data types |
Type |
Describe |
Example |
TINYINT |
1-byte (8-bit) signed integer |
1 |
SMALLINT |
2-byte (16-bit) signed integer |
1 |
Int |
4-byte (32-bit) signed integer |
1 |
BIGINT |
8-byte (64-bit) signed integer |
1 |
FLOAT |
4-byte (32-bit) single-precision floating-point number |
1.0 |
DOUBLE |
8-byte (64-bit) double-precision floating-point number |
1.0 |
BOOLEAN |
True/false |
True |
STRING |
String |
' Xia ', ' Xia ' |
From the table above we see that hive does not support date types, in hive dates are represented by strings, while the commonly used date format conversion operations are performed through custom functions.
Hive is developed in Java, and the basic data types in hive and Java basic data types are also one by one corresponding, except for the string type. Signed integer types: TINYINT, SMALLINT, int, and bigint are equivalent to Java byte, short, int, and long atomic types, which are 1-byte, 2-byte, 4-byte, and 8-byte signed integers respectively. Hive floating-point data types float and double, corresponding to the Java base type float and double. The Boolean type of hive is equivalent to the Java basic data type Boolean.
The type of string for hive is equivalent to the varchar type of the database, which is a mutable string, but it cannot declare how many characters it can store, theoretically it can store the number of characters in 2GB.
Hive supports basic types of conversions, the basic types of which can be converted to high byte types, such as tinyint, SMALLINT, int can be converted to float, and all integer types, float, and string types can be converted to double types. These transformations can be considered from the Java language type conversion, since hive is written in Java. It is also supported to convert a high byte type to a low byte type, which requires the use of the hive custom function cast.
Complex data types include arrays (array), mappings (map), and struct bodies (STRUCT), as shown in the following table:
Complex data types |
Type |
Describe |
Example |
ARRAY |
A set of ordered fields. field must be of the same type |
Array (1,2) |
MAP |
A set of unordered key/value pairs. The type of the key must be atomic, the value can be any type, the type of the same mapped key must be the same, and the type must be the same |
Map (' A ', 1, ' B ', 2) |
STRUCT |
A set of named fields. field types can be different |
Struct (' A ', 1,1,0) |
Now let's take a look at hive using examples of complex data types, building tables:
Create table Complex (col1 array< int, Col2 map<string, INT, Col3 Struct<a:string,b:int, C:double & gt;); |
Query statement: