Read Catalogue
- Order
- Import files to Hive
- To import query results from other tables into a table
- Dynamic partition Insertion
- Inserting the value of an SQL statement into a table
- Analog data File Download
- Series Index
This article is copyright Mephisto and Blog Park is shared, welcome reprint, but must retain this paragraph statement, and give the original link, thank you for your cooperation.
The article is written by elder brother (Mephisto), Sourcelink
Order
In a previous article, we introduced a simple description and practice of the table operations of hive. In actual use, there may be import and export of data, although you can use tools such as Sqoop for relational data import and export operations, but sometimes only a simple way to import and export
Let's start with a description of the data import, export, and cluster data migration for hive.
Import files to Hive
One: Grammar
' filepath ' [OVERWRITE] Into TABLE tablename [PARTITION (Partcol1=val1, Partcol2=val2 ...)]
Two: Import from local
Use "local" to import from locally
Three: Importing from a cluster
Remove the syntax "LOCAL".
Four: OVERWRITE
Using this parameter, if the same partition or file exists in the imported place, then delete and replace, and the person will skip directly.
Five: Actual combat
Based on the example of a partitioned score we created, we first construct two text files score_7 and score_8 respectively for July and August, and the documents will be available for download in the annex below.
Because there is no delimiter specified when the table is built, the delimiters for these two text files.
First put the file into the Linux host,/data/tmp the path.
Import Local Data
Load ' /data/tmp/score_7.txt ' into Table score PARTITION (openingtime=201507);
We found that 001 became 1. This is the type of table that is int, so it turns into int.
Put the score_8.txt in the cluster
Su-put score_8.txt/tmp/input
Importing cluster data
Load ' /tmp/input/score_8.txt ' into Table Score partition (Openingtime=201508);
To import query results from other tables into a table
One: Grammar
Standard Syntax:INSERTOVERWRITETABLETableName1[PARTITION (Partcol1=val1, Partcol2=val2 ...) [IF not EXISTS]] Select_statement1 fromfrom_statement;INSERT into TABLETableName1[PARTITION (Partcol1=val1, partcol2=val2 ...)]Select_statement1 fromfrom_statement; Hive extension (multiple inserts): fromfrom_statementINSERTOVERWRITETABLETableName1[PARTITION (Partcol1=val1, Partcol2=val2 ...) [IF not EXISTS]] Select_statement1[INSERT OVERWRITE TABLE tablename2 [PARTITION ... [IF not EXISTS]] Select_statement2][INSERT into TABLE tablename2 [PARTITION ...]Select_statement2] ...; fromfrom_statementINSERT into TABLETableName1[PARTITION (Partcol1=val1, partcol2=val2 ...)]Select_statement1[INSERT into TABLE tablename2 [PARTITION ...]Select_statement2][INSERT OVERWRITE TABLE tablename2 [PARTITION ... [IF not EXISTS]] Select_statement2] ...; Hive extension (Dynamic partition Inserts):INSERTOVERWRITETABLETableName PARTITION (partcol1[=val1], Partcol2[=val2]...) Select_statement fromfrom_statement;INSERT into TABLETableName PARTITION (partcol1[=val1], Partcol2[=val2]...) Select_statement fromFrom_statement;
Two: OVERWRITE
With this parameter, if the imported table or partition has the same content, then the content is replaced and no one jumps directly.
Three: INSERT into
The syntax starts from 0.80, and it maintains the integrity of the original data for the target table, partition.
Four: Actual combat
We construct a table with the same score table structure Score1
Create Table score1 ( ID int, studentid int, score Double by (Openingtime string);
Inserting data
Insert into table score1 partition (openingtime=201509values (1,' (2,'a');
We import the query results from table Score1 into score, where 201509 partitions are specified.
Insert Table Score partition (Openingtime=201509Select from Score1;
Dynamic partition Insertion
One: Description
Originally dynamic partition insert belongs to the other table results inserted content, but this function is very useful, special to separate it out. This feature is supported starting with hive 0.6.
Two: Parameters
Dynamic partition parameters are valid for the life of the command, so the parameter commands that are generally modified are placed before the import.
Property |
Default |
Note |
Hive.error.on.empty.partition |
False |
Whether to throw an exception if dynamic partition insert generates empty results |
Hive.exec.dynamic.partition |
False |
Needs to BES set to true enable dynamic partition inserts |
Hive.exec.dynamic.partition.mode |
Strict |
strict in mode, the user must specify at least one static partition in case the user accidentally overwrites all partitions , in nonstrict mode all partitions is allowed to be dynamic |
Hive.exec.max.created.files |
100000 |
Maximum number of HDFS files created by all mappers/reducers in a MapReduce job |
Hive.exec.max.dynamic.partitions |
1000 |
Maximum number of dynamic partitions allowed to being created in total |
Hive.exec.max.dynamic.partitions.pernode |
100 |
Maximum number of dynamic partitions allowed to BES created in each mapper/reducer node |
Three: Official website example
We can take a look at the hive website example.
from Page_view_stg PVs INSERT TABLE page_view PARTITION (dt='2008-06-08', country) SELECT NULL null, Pvs.ip, pvs.cnt
Here the country partition will be dynamically created based on the value of the pva.cut. Note that the name of this partition is not used, in the nonstrict
mode, DT This partition can also be created dynamically.
Four: Actual combat
Let's first empty the data for the score table (3 partitions)
Insert Table Score partition (Openingtime=201507, openingtime=201508, Openingtime=201509Selectfromwhere1= = 0;
Inserting July August data into Score1
Load ' /data/tmp/score_7.txt ' into table score1 partition (openingtime=201507); Load ' /data/tmp/score_8.txt ' into table score1 partition (openingtime=201508);
Set parameters such as auto-partitioning
Set hive.exec.dynamic.partition=true; Set hive.exec.dynamic.partition.mode=nonstrict; Set hive.exec.max.dynamic.partitions.pernode=10000
Import Score1 data automatically into score
Insert Table Select from Score1;
Image
Inserting the value of an SQL statement into a table
One: Description
The statement can insert values directly into the table.
Two: Grammar
Standard Syntax:INSERT into TABLETableName[PARTITION (Partcol1[=val1], Partcol2[=val2]...)]VALUESValues_row[, Values_row ...] WhereValues_row is:(Value[, Value ...] )whereA value isEitherNULL or anyValid SQL literal
Three: Official website example
CREATE TABLEStudents (nameVARCHAR( -), ageINTGpaDECIMAL(3,2)) CLUSTERED by(age) into 2BUCKETS STORED asORC;INSERT into TABLEStudentsVALUES('Fred Flintstone', *,1.28), ('Barney Rubble', +,2.32); CREATE TABLEPageviews (useridVARCHAR( -), link string, came_from string) partitioned by(Datestamp STRING)CLUSTERED by(userid) into theBUCKETS STORED asORC;INSERT into TABLEPageviews PARTITION (Datestamp= '2014-09-23') VALUES('jsmith','mail.com','sports.com'), ('jdoe','mail.com',NULL); INSERT into TABLEpageviews PARTITION (Datestamp)VALUES('Tjohnson','sports.com','finance.com','2014-09-23'), ('Tlee','finance.com',NULL,'2014-09-21');
Four: Actual combat
In the example of importing other table data into a table, we created a new table score1 and inserted the data into the score1 with the SQL statement. This is just a list of the above steps.
Inserting data
Insert into table score1 partition (openingtime=201509values (1,' (2,'a');
--------------------------------------------------------------------
Here, the content of this chapter is complete.
Analog data File Download
Github Https://github.com/sinodzh/HadoopExample/tree/master/2016/hive%20test%20file
Series Index
"Source" Self-Learning Hadoop series index from zero
This article is copyright Mephisto and Blog Park is shared, welcome reprint, but must retain this paragraph statement, and give the original link, thank you for your cooperation.
The article is written by elder brother (Mephisto), Sourcelink
"Source" self-learning Hadoop from zero: Hive data import and export, cluster data migration