Hive supports the basic data types supported by many relational databases and supports three collection data types with few relational databases. A related question is how these data types are presented in a text file, or how to describe the storage of text. Compared to most databases, Hive has a feature that provides a great deal of flexibility in how data is encoded in text. Most databases have full control over the storage of data on the hard disk and the lifecycle of the data. To keep you in control, Hive provides a variety of tools that make it easier to manage and process data.
Basic data types
Hive supports various lengths of integer, floating-point, Boolean, and arbitrary-length string types. Hive0.8.0 added a timestamp and binary type.
Table 3-1 lists the basic data types supported by hive
Table 3.1
Each type is executed in Java, so some behavioral details of the data type are the same as the corresponding Java types. For example, string is executed by Java string, float is executed by Java float, and so on.
Note that, like other SQL languages, hive does not support character arrays with a maximum length limit. Relational databases provide this feature for performance optimization because fixed-length records are easier to find, scan, and so on. For Hive, which is less restrictive, it may not contain data files and is quite flexible in file format, Hive depends on the delimiter used for the split field. In addition, both Hadoop and hive emphasize the ability to read and write the hard drive, so it's less important to have a column value of fixed length.
The value of the new data type timestamp can be an integer (the number of seconds from the Unix era time 1970-01-01 00:00:00), a single-precision floating-point type (the number of seconds from the Unix era 1970-01-01 00:00:00 plus 9-bit milliseconds), String (Yyyy-mm-dd hh:mm:ss.fffffffff is interpreted according to the JDBC Date string format Convention). Timestamp is interpreted as UTC time, for this reason, Hive provides built-in functions for converting time zones, To_utc_timestamp, From_utc_timestamp.
The binary type is similar to the varbinary type in other relational databases, binary is stored in rows, rather than as a blob (Binarylargeobject). One use of binary is that it contains arbitrary bytes in a row to prevent hive from being broken into numeric or string values.
Note that if your goal is to ignore the end of each line, then you don't need to binary. If the table schema for a table has 3 columns defined, and each row of the data file contains 5 values, hive ignores the last two values.
Let's say you run a query like this: compare single-precision floating-point columns with double-precision floating-point columns, or compare two different types of integers. Hive implicitly converts an integer to a larger integral type, converts float to double, and converts any integer to double;
Let's say you run a query that translates a string column into a numeric type. Hive can be explicitly converted, for example, S is a string type, converted to an integral type: cast (S as INT).
Collection type
Hive supports the structs,maps,arrays of these three collection types.
Table 3-2 lists the collection types supported by hive
Table 3.2