1.Hive data type :
Basic data types: tinyint, smallint, int, bigint, float, double, Boolean, string
Composite data type:
Array: An ordered field that must be of the same type
Map: A set of disordered health/value pairs, the type of kin must be of atomic type
struct: A named set of fields that can be of different types
The complex data type usage is as follows:
Create
table
complex(
col1 ARRAY<
INT
>,
Col2 MAP<STRING,
INT
>,
Col3 STRUCT<a:STRING,b :
INT
,c:
DOUBLE
>
);
Select
col1[0],col2[‘b’],col3.c
from
complex;
2.
Hive Data Model: The data model mainly includes: database, table, partition, bucket (1) database: Equivalent to the namespace in relational databases, the role is to isolate the databases application into different database schemas, hive provides the Create Statements such as database dbname, use dbname, and drop Database dbname (2) Table: tables consist of stored data and some metadata of the description table, stored data stored in a distributed file system, and metadata stored in a relational database. When the table is not loaded yet, only a directory is created on HDFs, such as Table A, where the path to the HDFs is ${hive warehouse path}/a, and the data file is copied to the HDFs directory after the data is loaded, with the same file name as the loaded data file, such as ${ The Hive Warehouse path}/a/empinfo.txthive has two tables: 1> managed Table: The data file for this table is loaded into the Data Warehouse directory of hive settings 2> External table: This table is stored in an HDFs directory other than the Hive Data Warehouse directory. You can also create a managed table in the hive's Data Warehouse:
hive>Create
table
tuoguan_tbl (flied string);
hive>load
data
local
inpath ‘home/hadoop/test.txt’
into
table
tuoguan_tbl;
To create an external table:
hive> Create external Table external_tb1 (field string)
> Location '/user/username/input/tb_wordcount '; //If no location data is loaded into Hive's data Warehouse
hive>load
data
local
inpath ‘test.txt’
into
table
external_tbl;
The difference between a managed table and an external table differs in addition to the directory in which the data is loaded, and one is the difference between using the drop command, the data stored at the drop by the managed table and the metadata are deleted, and the external table removes only the metadata and does not delete the stored data.
To view specific information about a table using:
Desc TABLENAME or DESC formatted tableName
(3) Partition: Partition
Hive partitions are roughly divided by the values of a column, and each partition corresponds to a directory on the HDFs, for example:
There are several directories/user/username/input/2015/01,/user/username/input/2015/02 two directories, the building table wants to be divided into years, months, can be built table:
Create
table
logs(id
int
,line string)
Partitioned
by
(year string,month string);
Then query the select * from logs where month=02 then query will only scan/user/username/input/2015/02 this directory (4) Bucket: To use a bucket, first open the hive control of the bucket:hive>
set
hive.enforce.bucketing =
true
Buckets are hashed according to the specified value, and each bucket is a file in the table directory
Research on 2_ data model of Hive