A summary of Hive SQL usage and Data Loading

Last Update:2014-06-15 Source: Internet

Author: User

Tags integer numbers

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Hive is an open-source query tool based on the Hadoop framework of facebook. That is to say, if you need hive, you must first install hadoop. This is a summary of recent experiences and experiences with hive. After installing and configuring hadoop, hive, pig, and hbase, it is better to set up another serialization.
So I suppose you already have hive.
1. Data Type of the field. Hive sets the data type for fields in the hive table to meet the needs of data mining. You can also set the index for frequently where fields.
The data types are as follows:
STRING uncertain length STRING
TINYINT three-digit long integer
SMALLINT 5-digit long integer
INT 10-digit integer
BIGINT 19-digit integer
FLOAT floating point number
DOUBLE
BOOLEAN, that is, TRUE and FALSE

Different integer numbers have different number of digits. When creating a table, you must note that data truncation is not caused by insufficient digits. A large number of digits may cause a waste of metadata space.

Three other types
STRUCTS struct
ARRAY
MAP. I don't know how to translate it.

2. Create a data table.
Hive data tables are divided into two types: Internal tables and external tables.
An internal table refers to a table created by hive and imported into the database through load data inpath. This type of table can be understood as a data table where both the data and the table structure are stored together. When you use drop table table_name to delete the TABLE structure in the metadata, the data in the TABLE will also be deleted from hdfs.
An External table means that data has been stored in hdfs before the table structure is created. By creating a table structure, the data is formatted into the table structure. When you drop table table_name, hive only deletes the TABLE structure of metadata and does not delete files on hdfs. Therefore, external tables can be used more securely than internal tables.
Statement for creating an internal TABLE: create table database. table1
(
Column1 string comment 'comment1 ',
Column2 int comment 'comment2'
);
External table creation statement:
The following describes how to CREATE an external table when saving files in hdfs without LZO: create external table if not exists database. table1
(
Column1 string comment 'comment1 ',
Column2 string comment 'comment2'
)
Row format delimited fields terminated by "\ t"
LOCATION 'hdfs: // data/dw/asf/8080 ';

The following describes how to create an external table after files in hdfs are compressed with LZO. Of course, you need hadoop-gpl support to read lzo files in text format. Create external table if not exists database. table1
(
Column1 string comment 'comment1 ',
Column2 string comment 'comment2'
)
Row format delimited fields terminated by "\ t"
Stored as inputformat "com. hadoop. mapred. DeprecatedLzoTextInputFormat" OUTPUTFORMAT "org. apache. hadoop. hive. ql. io. HiveIgnoreKeyTextOutputFormat"
LOCATION 'hdfs: // data/dw/asf/100'; the red part is very tricky. Almost all articles related to this word on the Internet are copied, pasted, and copied, all of them are written as SORTED. If the hive Chinese document you are looking for is not me, I am afraid an error will be reported when creating an external table.
3. delete a data table
As we mentioned earlier, when a data table is deleted, the internal table is deleted together with the data, while the external table only deletes the table structure, and the data is retained. Pay attention to this.
4. Modify the table structure
ADD new field alter table table1 ADD COLUMNS
(
Column1 string COMMENT 'comment1 ',
Column2 string COMMENT 'comment2'
);
Alter table database. table1 rename to database. table2;
Modify the data table to use LZO to compress the data. Alter table table1
SET FILEFORMAT
INPUTFORMAT "com. hadoop. mapred. DeprecatedLzoTextInputFormat"
OUTPUTFORMAT
"Org. apache. hadoop. hive. ql. io. hiveIgnoreKeyTextOutputFormat "; in this sentence, even if the original text data is changed to lzo compression, you can use it again without recreating the table.
5. load data from internal tables
Load data from the local file system. Load data local inpath "/opt/data/1.txt" into table table1; this means to LOAD data from the LOCAL file system/opt/DATA/1.txt to table1. Hive reads the file and writes the content to the location of table1 in hdfs.
Load data inpath "/data/datawash/1.txt" into table table1 from HDFS, which means to write data/DATA/datawash/1.txt from HDFS to the directory where table1 is located.
This is the case about OVERWRITE in the load. Load data local inpath "/opt/data/1.txt" overwrite into table table1; If OVERWRITE is added, OVERWRITE the existing DATA. If you are sure there is no data, write it.
HIVE also contains many functions, such as View, index, and user-defined functions.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More