"Source" self-learning Hadoop from zero: Hive data import and export, cluster data migration

Source: Internet
Author: User

Read Catalogue
    • Order
    • Import files to Hive
    • To import query results from other tables into a table
    • Dynamic partition Insertion
    • Inserting the value of an SQL statement into a table
    • Analog data File Download
    • Series Index

This article is copyright Mephisto and Blog Park is shared, welcome reprint, but must retain this paragraph statement, and give the original link, thank you for your cooperation.

The article is written by elder brother (Mephisto), Sourcelink

Order

In a previous article, we introduced a simple description and practice of the table operations of hive. In actual use, there may be import and export of data, although you can use tools such as Sqoop for relational data import and export operations, but sometimes only a simple way to import and export

Let's start with a description of the data import, export, and cluster data migration for hive.

Import files to Hive
One: Grammar
' filepath ' [OVERWRITE] Into TABLE tablename [PARTITION (Partcol1=val1, Partcol2=val2 ...)]
Two: Import from local

Use "local" to import from locally

Three: Importing from a cluster

Remove the syntax "LOCAL".

Four: OVERWRITE

Using this parameter, if the same partition or file exists in the imported place, then delete and replace, and the person will skip directly.

Five: Actual combat

Based on the example of a partitioned score we created, we first construct two text files score_7 and score_8 respectively for July and August, and the documents will be available for download in the annex below.

Because there is no delimiter specified when the table is built, the delimiters for these two text files.

First put the file into the Linux host,/data/tmp the path.

Import Local Data

Load ' /data/tmp/score_7.txt '  into Table score PARTITION (openingtime=201507);

We found that 001 became 1. This is the type of table that is int, so it turns into int.

Put the score_8.txt in the cluster

Su-put score_8.txt/tmp/input

Importing cluster data

Load ' /tmp/input/score_8.txt '  into Table Score partition (Openingtime=201508);

To import query results from other tables into a table
One: Grammar
Standard Syntax:INSERTOVERWRITETABLETableName1[PARTITION (Partcol1=val1, Partcol2=val2 ...) [IF not EXISTS]] Select_statement1 fromfrom_statement;INSERT  into TABLETableName1[PARTITION (Partcol1=val1, partcol2=val2 ...)]Select_statement1 fromfrom_statement; Hive extension (multiple inserts): fromfrom_statementINSERTOVERWRITETABLETableName1[PARTITION (Partcol1=val1, Partcol2=val2 ...) [IF not EXISTS]] Select_statement1[INSERT OVERWRITE TABLE tablename2 [PARTITION ... [IF not EXISTS]] Select_statement2][INSERT into TABLE tablename2 [PARTITION ...]Select_statement2] ...; fromfrom_statementINSERT  into TABLETableName1[PARTITION (Partcol1=val1, partcol2=val2 ...)]Select_statement1[INSERT into TABLE tablename2 [PARTITION ...]Select_statement2][INSERT OVERWRITE TABLE tablename2 [PARTITION ... [IF not EXISTS]] Select_statement2] ...; Hive extension (Dynamic partition Inserts):INSERTOVERWRITETABLETableName PARTITION (partcol1[=val1], Partcol2[=val2]...) Select_statement fromfrom_statement;INSERT  into TABLETableName PARTITION (partcol1[=val1], Partcol2[=val2]...) Select_statement fromFrom_statement;

Two: OVERWRITE

With this parameter, if the imported table or partition has the same content, then the content is replaced and no one jumps directly.

Three: INSERT into

The syntax starts from 0.80, and it maintains the integrity of the original data for the target table, partition.

Four: Actual combat

We construct a table with the same score table structure Score1

Create Table score1 (  ID                int,  studentid       int,  score               Double   by (Openingtime string);

Inserting data

Insert  into table score1 partition (openingtime=201509values (1,'  (2,'a');

We import the query results from table Score1 into score, where 201509 partitions are specified.

Insert Table Score partition (Openingtime=201509Select from Score1;

Dynamic partition Insertion
One: Description

Originally dynamic partition insert belongs to the other table results inserted content, but this function is very useful, special to separate it out. This feature is supported starting with hive 0.6.

Two: Parameters

Dynamic partition parameters are valid for the life of the command, so the parameter commands that are generally modified are placed before the import.

Property Default Note
Hive.error.on.empty.partition False Whether to throw an exception if dynamic partition insert generates empty results
Hive.exec.dynamic.partition False Needs to BES set to true enable dynamic partition inserts
Hive.exec.dynamic.partition.mode Strict strictin mode, the user must specify at least one static partition in case the user accidentally overwrites all partitions , in nonstrict mode all partitions is allowed to be dynamic
Hive.exec.max.created.files 100000 Maximum number of HDFS files created by all mappers/reducers in a MapReduce job
Hive.exec.max.dynamic.partitions 1000 Maximum number of dynamic partitions allowed to being created in total
Hive.exec.max.dynamic.partitions.pernode 100 Maximum number of dynamic partitions allowed to BES created in each mapper/reducer node
Three: Official website example

We can take a look at the hive website example.

 from Page_view_stg PVs INSERT TABLE page_view PARTITION (dt='2008-06-08', country)        SELECT NULL null, Pvs.ip, pvs.cnt

Here the country partition will be dynamically created based on the value of the pva.cut. Note that the name of this partition is not used, in the nonstrict mode, DT This partition can also be created dynamically.

Four: Actual combat

Let's first empty the data for the score table (3 partitions)

Insert Table Score partition (Openingtime=201507, openingtime=201508, Openingtime=201509Selectfromwhere1= =  0;

Inserting July August data into Score1

Load ' /data/tmp/score_7.txt '  into table score1 partition (openingtime=201507); Load ' /data/tmp/score_8.txt '  into table score1 partition (openingtime=201508);

  

Set parameters such as auto-partitioning

Set  hive.exec.dynamic.partition=true;   Set  hive.exec.dynamic.partition.mode=nonstrict;   Set  hive.exec.max.dynamic.partitions.pernode=10000

Import Score1 data automatically into score

Insert Table Select  from Score1;

Image

Inserting the value of an SQL statement into a table
One: Description

The statement can insert values directly into the table.

Two: Grammar
Standard Syntax:INSERT  into TABLETableName[PARTITION (Partcol1[=val1], Partcol2[=val2]...)]VALUESValues_row[, Values_row ...] WhereValues_row is:(Value[, Value ...] )whereA value isEitherNULL or  anyValid SQL literal
Three: Official website example
CREATE TABLEStudents (nameVARCHAR( -), ageINTGpaDECIMAL(3,2))  CLUSTERED  by(age) into 2BUCKETS STORED asORC;INSERT  into TABLEStudentsVALUES('Fred Flintstone', *,1.28), ('Barney Rubble', +,2.32); CREATE TABLEPageviews (useridVARCHAR( -), link string, came_from string) partitioned by(Datestamp STRING)CLUSTERED  by(userid) into  theBUCKETS STORED asORC;INSERT  into TABLEPageviews PARTITION (Datestamp= '2014-09-23')  VALUES('jsmith','mail.com','sports.com'), ('jdoe','mail.com',NULL); INSERT  into TABLEpageviews PARTITION (Datestamp)VALUES('Tjohnson','sports.com','finance.com','2014-09-23'), ('Tlee','finance.com',NULL,'2014-09-21');
Four: Actual combat

In the example of importing other table data into a table, we created a new table score1 and inserted the data into the score1 with the SQL statement. This is just a list of the above steps.

Inserting data

Insert  into table score1 partition (openingtime=201509values (1,'  (2,'a');

--------------------------------------------------------------------

Here, the content of this chapter is complete.

Analog data File Download
Github Https://github.com/sinodzh/HadoopExample/tree/master/2016/hive%20test%20file
Series Index

"Source" Self-Learning Hadoop series index from zero

This article is copyright Mephisto and Blog Park is shared, welcome reprint, but must retain this paragraph statement, and give the original link, thank you for your cooperation.

The article is written by elder brother (Mephisto), Sourcelink

"Source" self-learning Hadoop from zero: Hive data import and export, cluster data migration

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.