Hive Learning notes-data manipulation

Last Update:2015-08-21 Source: Internet

Author: User

Tags hdfs dfs

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Hive Data Manipulation

Hive Command-line Operations
Hive-d--define <key=value> Define a key-value can be used on the command line
hive-d database <databasename> specifying the databases used
Hive-e "HQL" does not need to enter the CLI to execute the HQL statement, which can be used in the script
Hive-f FileName executes the hql in a file, the SQL statement comes from the file
Hive-h hostname access host via host address
Hive-h--help Printing Help information
Hive-h--hiveconf <property=value> Use the information here to define the property's value
Hive-h--hivevar <key-value> Use variable commands to reassign a command to a value
Hive-i filename Hive's initialization file, you can put some of hive's initialization information into this file, such as the use of custom functions can be written in the corresponding JAR package directory
Hive-s--silent into quiet mode under the shell, do not need to print some output information
Hive-v--verbose The details of the print execution, such as the SQL statement executed

Example: Querying the test table and placing the printed results in/home/data/select_result.txt
Hive-s-E "select * from Test" >/home/data/select_result.txt
The SQL statement that contains the execution
Hive-v-E "select * from Test" >/home/data/select_result.txt
Execute the SQL that is placed in the file
Hive-f/home/colin/hive-1.2.1/select_test

The Hive CLI uses list to view the file|jar|archive in the distributed cache (as added through the Add jar, which can be viewed through the list jar to the jar package added to the distributed cache)
The Hive CLI uses source to execute files under the specified directory, such as executing a SQL file under the specified directory Source/home/colin/hive-1.2.1/select_test

Hive Action Variables
Configuration variables
Set Val_name=val_value;
${hiveconf:val}
View environment variables under Linux
${ENV: variable name},env view all environment variables
Instance definition variable val_test, set to Yang, as a query statement condition
Set Val_test=yang;
SELECT * from test2 where name= ' ${hiveconf:val_test} ';
View
Environment variables for Hive_home
Select ' ${env:hive_home} ' from test;
Note: How many records are in the test table, and how many times are the paths printed?

Hive Data Loading
Internal table Data loading
Loading when creating a table
CREATE TABLE TableName as Select Col_1,col_2 ... From TableName2;
Specify the data location when creating a table
CREATE TABLE tablename (col_name typye comment ...) location ' path ';(p ath as the path in HDFs, note that this path is the upper-level directory of the file, that is, specify the file to the upper-level directory, The data in the directory will be used as data for that table. And this method does not create the table directory under Hive/warehouse, because he will use the path specified in HDFs as the Table directory operation)
Note: In this way, data ownership is given to the current table in the table, and the data is deleted when the table is deleted (along with the upper directory)
Local Data loading
Load data local inpath ' LocalPath ' [overwrite] into table tableName;
Loading data in HDFs
Load data inpath ' Hdfspath ' [overwrite] into table tableName;
Note: This is the way to move data from the specified location in HDFs to the table directory
Copy data to a specified location using the Hadoop command (shell execution in hive and Shell execution in Linux)
HDFs Dfs-copyfromlocal/home/data/data
The Dfs-copyfromlocal/home/data data in the hive shell can be executed directly in hive, and hive can execute Linux commands, but it needs to be preceded by a command! ）

Loading data from query statements
Insert [Overwrite|into] table TableName Select Col1,col2 ... from tablenName2 where ...
From tableNable2 Insert [overwrite|into] table TableName Select Col1,col2 ... where ...
Select Col1,col2. From tableName2 where ... insert [overwrite|into] table tableName;
Note: You can select a field name that differs from table, and hive does not perform field detection and type detection when the data is loaded. Detected only at query time
External Data loading
Specify the data location when creating the table (because appearances have no control over the data)
Create external table (col_name type comment ...) location ' path ';
The INSERT statement is the same as the inner table
With the Hadoop command, as in the inner table
Hive Partition Table Data loading
Internal partition table and inner table data load similar
External partition table and appearance data load similar
The difference is the specified partition, in the external partition table in the level of data storage table partition consistent, if there is no new partition under the partition table, even if there is no data under the directory can not be found, when the directory structure corresponding to the need to add partition ALTER TABLE TableName ADD partition ( dt=20150820).
Load data local inpath ' path ' [Overwrie] into table Tablenmme partition (Pname= '.. ');
Insert [Overwrite|into] table TableName partition (pname= ' ... ') select Col1,col2. From TableName2 where ...

Note: If you set multiple characters to divide the row format delimiter, only the first word have the function
The field type cannot be converted to each other when the load data is loaded, otherwise it will load as null
When inserting data, if the type after SELCT cannot be converted to each other, the insert is null;
In HDFs, NULL is displayed as \ n.

Hive Data Export
How to export:
How Hadoop commands are
Get
Text
by insert .... DIRECTOR
Insert overwrite [local] directory ' path ' [row format delimited fields terminated by ' \ t ' lines terminated by ' \ n '] Select Col1,col2. From TableName
Note: If local is used locally, otherwise in HDFs, row format will only work locally (in 1.2.1hive, row format is already available in HDFs
)。
Pipeline with shell command
Through third-party tools
Instance:
HDFs dfs-get/user/hive/warehouse/test4/*./data/newdata
HDFs dfs-text/user/hive/warehouse/test5/* >./data/newdata (multiple formats can be output, compressed, serialized, etc.)
Hive-s-E "select * FROM Test4" | grep yang >./data/newdata

Hive Dynamic Partitioning
The partition is not deterministic and needs to be viewed from the query results. You do not need to use ALTER TABLE for each partition to add
Parameters that need to be configured to use dynamic partitioning:
Set hive.exec.dynamic.partition=true;//using dynamic partitioning
There are two ways to set the hive.exec.dynamic.partition.model=nonstrick;//partition: One is to have a restricted partition Strick, a static partition is required, and the first one is placed. One is Nostrick unrestricted mode
Set hive.exec.max.dynamic.partitions.pernode=10000;//The maximum number of dynamic partitions generated per node
The maximum number of dynamic partitions generated by set hive.exec.max.dynamic.partitions=100000;//
Set hive.exec.max.created.fiels=150000;//the maximum number of files a task can create
Set dfs.datanode.max.xcievers=8192;//limit the maximum number of open files at one time

Insert Overwrie table TEST7 partition (dt) Select Name,time as DT from TEST6;

Table Property Operations
Modify Table Name:
ALTER TABLE TableName Rename to NewName
The modifications are listed as follows:
ALTER TABLE tableName change column old_col new_col newType comment ' .... ' After colname (changes aftercolname to first if you want to do a column)
Add Columns:
ALTER TABLE TableName Add columns (c1 type comment ' ... ', C2 type comment ' ... ')
Modify Table Properties
View Table Properties
DESC formatted tablename
This is the property information for the table that you can modify
Table Parameters:
Column_stats_accurate false
Last_modified_by Colin
Last_modified_time 1440154819
Numfiles 0
NumRows-1
RawDataSize-1
TotalSize 0
Transient_lastddltime 1440154819
To modify a property:
ALTER TABLE TableName set Tblproerties (' propertiesname ' = ' ... ');
Like modifying comment.
ALTER TABLE TableName set Tblproperties ("comment" = "xxxxx");
To modify serialization information:
No partition table
ALTER TABLE TableName set Serdepropertie (' fields.delim ' = ' \ t ');
There are partitioned tables
ALTER TABLE TableName partition (dt= ' xxxx ') set serdeproperties (' fields.delim ' = ' \ t ');
Modify Location:
ALTER TABLE TableName [partition (..)] set localtion ' path ';
Internal table External Table conversions:

ALTER TABLE TableName set Tblproperties (' EXTERNAL ' = ' true| FALSE '); must be capitalized external

More Property Actions view: Https://cwiki.apache.org/confluence/display/Hive/Home

Hive Learning notes-data manipulation

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More