Research on 2_ data model of Hive

Source: Internet
Author: User

1.Hive data type :

Basic data types: tinyint, smallint, int, bigint, float, double, Boolean, string

Composite data type:

Array: An ordered field that must be of the same type

Map: A set of disordered health/value pairs, the type of kin must be of atomic type

struct: A named set of fields that can be of different types

The complex data type usage is as follows:

Createtablecomplex(

col1 ARRAY<INT>,

Col2 MAP<STRING, INT >, Col3 STRUCT<a:STRING,b : INT ,c: DOUBLE > ); Select col1[0],col2[‘b’],col3.c from complex;2. Hive Data Model: The data model mainly includes: database, table, partition, bucket (1) database: Equivalent to the namespace in relational databases, the role is to isolate the databases application into different database schemas, hive provides the Create Statements such as database dbname, use dbname, and drop Database dbname (2) Table: tables consist of stored data and some metadata of the description table, stored data stored in a distributed file system, and metadata stored in a relational database. When the table is not loaded yet, only a directory is created on HDFs, such as Table A, where the path to the HDFs is ${hive warehouse path}/a, and the data file is copied to the HDFs directory after the data is loaded, with the same file name as the loaded data file, such as ${ The Hive Warehouse path}/a/empinfo.txthive has two tables: 1> managed Table: The data file for this table is loaded into the Data Warehouse directory of hive settings 2> External table: This table is stored in an HDFs directory other than the Hive Data Warehouse directory. You can also create a managed table in the hive's Data Warehouse: hive>Create table tuoguan_tbl (flied string); hive>load data local inpath ‘home/hadoop/test.txt’ into table tuoguan_tbl;To create an external table:

hive> Create external Table external_tb1 (field string)
> Location '/user/username/input/tb_wordcount '; //If no location data is loaded into Hive's data Warehouse

hive>load data local inpath ‘test.txt’ into table external_tbl;

The difference between a managed table and an external table differs in addition to the directory in which the data is loaded, and one is the difference between using the drop command, the data stored at the drop by the managed table and the metadata are deleted, and the external table removes only the metadata and does not delete the stored data.

To view specific information about a table using:

Desc TABLENAME or DESC formatted tableName

(3) Partition: Partition

Hive partitions are roughly divided by the values of a column, and each partition corresponds to a directory on the HDFs, for example:

There are several directories/user/username/input/2015/01,/user/username/input/2015/02 two directories, the building table wants to be divided into years, months, can be built table:

Createtablelogs(id int,line string)

Partitioned by (year string,month string);Then query the select * from logs where month=02 then query will only scan/user/username/input/2015/02 this directory (4) Bucket: To use a bucket, first open the hive control of the bucket:hive> set hive.enforce.bucketing = trueBuckets are hashed according to the specified value, and each bucket is a file in the table directory

Research on 2_ data model of Hive

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.