A summary of Hive SQL usage and Data Loading

Source: Internet
Author: User
Tags integer numbers

Hive is an open-source query tool based on the Hadoop framework of facebook. That is to say, if you need hive, you must first install hadoop. This is a summary of recent experiences and experiences with hive. After installing and configuring hadoop, hive, pig, and hbase, it is better to set up another serialization.
So I suppose you already have hive.
1. Data Type of the field. Hive sets the data type for fields in the hive table to meet the needs of data mining. You can also set the index for frequently where fields.
The data types are as follows:
STRING uncertain length STRING
TINYINT three-digit long integer
SMALLINT 5-digit long integer
INT 10-digit integer
BIGINT 19-digit integer
FLOAT floating point number
DOUBLE
BOOLEAN, that is, TRUE and FALSE

Different integer numbers have different number of digits. When creating a table, you must note that data truncation is not caused by insufficient digits. A large number of digits may cause a waste of metadata space.

Three other types
STRUCTS struct
ARRAY
MAP. I don't know how to translate it.

2. Create a data table.
Hive data tables are divided into two types: Internal tables and external tables.
An internal table refers to a table created by hive and imported into the database through load data inpath. This type of table can be understood as a data table where both the data and the table structure are stored together. When you use drop table table_name to delete the TABLE structure in the metadata, the data in the TABLE will also be deleted from hdfs.
An External table means that data has been stored in hdfs before the table structure is created. By creating a table structure, the data is formatted into the table structure. When you drop table table_name, hive only deletes the TABLE structure of metadata and does not delete files on hdfs. Therefore, external tables can be used more securely than internal tables.
Statement for creating an internal TABLE: create table database. table1
(
Column1 string comment 'comment1 ',
Column2 int comment 'comment2'
);
External table creation statement:
The following describes how to CREATE an external table when saving files in hdfs without LZO: create external table if not exists database. table1
(
Column1 string comment 'comment1 ',
Column2 string comment 'comment2'
)
Row format delimited fields terminated by "\ t"
LOCATION 'hdfs: // data/dw/asf/8080 ';

The following describes how to create an external table after files in hdfs are compressed with LZO. Of course, you need hadoop-gpl support to read lzo files in text format. Create external table if not exists database. table1
(
Column1 string comment 'comment1 ',
Column2 string comment 'comment2'
)
Row format delimited fields terminated by "\ t"
Stored as inputformat "com. hadoop. mapred. DeprecatedLzoTextInputFormat" OUTPUTFORMAT "org. apache. hadoop. hive. ql. io. HiveIgnoreKeyTextOutputFormat"
LOCATION 'hdfs: // data/dw/asf/100'; the red part is very tricky. Almost all articles related to this word on the Internet are copied, pasted, and copied, all of them are written as SORTED. If the hive Chinese document you are looking for is not me, I am afraid an error will be reported when creating an external table.
3. delete a data table
As we mentioned earlier, when a data table is deleted, the internal table is deleted together with the data, while the external table only deletes the table structure, and the data is retained. Pay attention to this.
4. Modify the table structure
ADD new field alter table table1 ADD COLUMNS
(
Column1 string COMMENT 'comment1 ',
Column2 string COMMENT 'comment2'
);
Alter table database. table1 rename to database. table2;
Modify the data table to use LZO to compress the data. Alter table table1
SET FILEFORMAT
INPUTFORMAT "com. hadoop. mapred. DeprecatedLzoTextInputFormat"
OUTPUTFORMAT
"Org. apache. hadoop. hive. ql. io. hiveIgnoreKeyTextOutputFormat "; in this sentence, even if the original text data is changed to lzo compression, you can use it again without recreating the table.
5. load data from internal tables
Load data from the local file system. Load data local inpath "/opt/data/1.txt" into table table1; this means to LOAD data from the LOCAL file system/opt/DATA/1.txt to table1. Hive reads the file and writes the content to the location of table1 in hdfs.
Load data inpath "/data/datawash/1.txt" into table table1 from HDFS, which means to write data/DATA/datawash/1.txt from HDFS to the directory where table1 is located.
This is the case about OVERWRITE in the load. Load data local inpath "/opt/data/1.txt" overwrite into table table1; If OVERWRITE is added, OVERWRITE the existing DATA. If you are sure there is no data, write it.
HIVE also contains many functions, such as View, index, and user-defined functions.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.