Hive [3] data type and file format,

Source: Internet
Author: User

Hive [3] data type and file format,
Hive supports most of the basic data types in relational databases, and also supports three Collection types. 3.1 Hive basic data types support multiple integer and floating-point data types with different degrees, the specifics are as follows (all reserved words): tinyint 1 byte signed integer smalint 2byte signed number int 4byte signed number bigint 8byte signed number boolean type, true or falsefloat Single-precision floating point double string timpstamp integer, floating point or string binary byte array, note that these are the implementation of interfaces in JAVA, therefore, the specific behavior details of these types are exactly the same as those of the corresponding types in java. If a user matches a float type column and a double type column in a query, hive implicitly converts the type to the larger one of the two types. cast (s as int) converting s to int Type 3.2 sets data type Hive columns support the use of struct, map, array set data type, for example: STRUCT column type: struct {first STRING, last STRING} such: struct ('john', 'doe ') MAP is a set of key-value pairs for the collection field name [last] to obtain values such as: map ('first', 'join', 'last ', 'doe ') ARRAY is a collection of variables of the same type and name. ['john', 'doe'] For example: Array (['john', 'doe ']) for example: employee relationship table create table employees (name STRING, -- name salary float, -- salary subordinates array <string>, -- subordinate employee deductions map <string, float>, -- content deducted from wages (such as tax, social security, and Provident Fund) address struct <street: string, city: string, stat: string, zip: int> -- employee home address) 3.3 text file data encoding comma-separated values (CSV) or tab-separated values (TSV) files; disadvantages, the comma or tabs in the file that do not need to be processed as delimiters should be used with caution; the default record and field separator In Hive: \ n text file linefeed ^ A separator field (column ), in the create table statement, octal encoding (\ 001) can be used to separate elements in ARRAY or STRUCT, or to separate key-value pairs in MAP. octal encoding (\ 002) can be used) ^ C is used to separate the key and value in the MAP. It uses the octal encoding (\ 003) to indicate that other separators can be specified without using these default separators. For example: create table employees (name STRING, salary FLOAT, subordinates ARRAY (STRING), deductions MAP (STRING, FLOAT), address STRUCT <street: STRING, city: STRING, state: STRING, zip: INT>) row format delimited -- must be written before the following clause (except stored) fileds terminated by '\ 001' -- Hive uses ^ A as the column separator collection items terminated by' \ 002' -- indicates that Hive uses ^ B as the delimiter between set elements MAP KEYS TERMINATED '\ 003' -- indicates that Hive uses ^ C as the separator between MAP key values. lines terminated by '\ n' -- the following two statements indicate that row format delimited is not required as the keyword STORED. as textfile; -- This sentence is rarely used. Note: until the beginning of the Directory, Hive supports \ n for linesterminated by, that is, the delimiter between rows can only be \ nHive and supports other types of text formats, in the 15 lessons, the definition of a table is separated by commas: create table some_data (fistr float, second float, third float) row format delimited fileds terminated ','; 3.4 The read-time mode Hive does not check the mode when writing data to the database, nor performs verification during data loading. Instead, it performs the query, that is, the read-time mode; if the mode does not match the file content, and the number of fields in each record is less than the number of fields defined in the corresponding mode, you will see many null values in the query results; if some fields are of the numeric type, but Hive finds a non-numeric string value during reading, the return value is null, in this case, Hive tries its best to restore the error as much as possible;

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.