"Source" self-learning Hadoop from zero: Hive data import and export, cluster data migration

Last Update:2016-01-19 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Read Catalogue

Order
Write the results of the query to the file system
Cluster Data Migration One
Cluster Data Migration II
Series Index

This article is copyright Mephisto and Blog Park is shared, welcome reprint, but must retain this paragraph statement, and give the original link, thank you for your cooperation.

The article is written by elder brother (Mephisto), Sourcelink

Order

In the previous article, we introduced a variety of ways to import hive data, so that our hive has a data source, but sometimes we may need to pure export, or cluster hive data migration (different clusters, different versions), we can use these two chapters of knowledge to achieve.

Here we begin to describe the data export of hive and the migration of the cluster hive data.

Write the results of the query to the file system

One: Description

The results of the query can be written to the file system by simply modifying the import syntax from other tables in the previous article.

Two: Syntax:
Standard Syntax:INSERTOVERWRITE[LOCAL]DIRECTORY Directory1[ROW FORMAT Row_format] [STORED as File_format](Note: onlyAvailable starting withHive0.11.0)  SELECT... from ... Hive extension (multiple inserts): fromfrom_statementINSERTOVERWRITE[LOCAL]DIRECTORY directory1 select_statement1[INSERT OVERWRITE [LOCAL]DIRECTORY directory2 select_statement2] ... row_format:delimited[Fields TERMINATED by Char [escaped by Char]][COLLECTION ITEMS TERMINATED by Char]        [MAP KEYS TERMINATED by Char] [LINES TERMINATED by Char]        [NULL DEFINED as Char](Note: onlyAvailable starting withHive0.13)
Three: Write to local

If local is used, the data is written to the local

Four: Writing to the cluster

If you do not use local, the data is written to the specified HDFs, and if the full path is not written, Hadoop's configuration Entry fs.default.name (the Namenode URI) is used.

Five: Actual combat

Modify the TMP folder permissions (this is only a test, so use the maximum permissions)
chmod 777 TMP
Into Hive
sudo -u hdfs Hive
Export the score table data from the previous article to a local
Insert Overwrite local directory  '/data/tmp/score'select*   from score;
We can see that there are files in the/data/tmp/score/directory.
Cd/data/tmp/score
ll
This allows us to export the hive data locally.

Below we use the command without the local parameter to lead the Hive table data into HDFs
Insert Overwrite  directory  '/data/tmp/score'Select  * from score;
We use the LS command in HDFs to view
Hadoop FS-LS /data/tmp/score
There is only one file here, and the above is not the same, but the total content is the same, the same data above the export, and sometimes only one file. It's not a fancy place to be.

Cluster Data Migration One

One: Introduction

In the official website, we can see the export and import, the function from the Hive0.8 began to join.

Two: Export/import

The

Export command exports tables or partitions based on metadata, and the output location can be another Hadoop cluster or hive instance. Tables with partitions are supported. export meta data storage in target directory file stored in subdirectory < Span id= "nohighlight_0.14125266089104116".

The metadata store DBMS that imports the exported source and destination can be a different relational database.

Three: Export syntax

TABLE [PARTITION (part_column= "value" [, ...] ])]  to'export_target_path'

IV: Import Syntax

[[EXTERNAL]TABLE[PARTITION (part_column= "value" [, ... ] ]]] [from'source_path'[      Location ' Import_target_path ']

V: official Example

Simple Import Export

Table  to ' hdfs_exports_location/department '  from'hdfs_exports_location/department';

Rename Import Export

Table  to ' hdfs_exports_location/department '  tablefrom'hdfs_exports_location/department';

Partition Export

table Employee partition (emp_country="in", emp_state=to' Hdfs_exports_location/employee '  from'hdfs_exports_location/employee';

Partition Import

Table  to ' Hdfs_exports_location/employee '  Table Employee partition (emp_country="Us", emp_state=from'  hdfs_exports_location/employee';

Specify the import location

Table  to ' hdfs_exports_location/department '  tablefrom'hdfs_exports_location/department'           'import_target_location/department';

Import as external table

Table  to ' hdfs_exports_location/department '  tablefrom'hdfs_exports_location/department';

Cluster Data Migration II

One: Introduction

Although the official Export/import command is very powerful, but in actual use, it may be the version of the different, there will be unable to import the situation, oneself in this piece also pondering, summed up their own set of partitioned Hive table Data Migration scheme, The scheme was successfully migrated in Cloudera and Hontorworks clusters, and the hive version was inconsistent.

Two: Export data

This is used as a data source because the hive version of the Cloudera release version CDH-5.3.3 is less than 0.8.

Create a partitioned table score

Create Table score (  ID                int,  studentid       int,  score               Double   by (Openingtime string);

Import 7, August data based on the way the data was imported in the previous article

Load ' /data/tmp/score_7.txt '  into Table score PARTITION (openingtime=201507);

Refer to our above export to local or put under/data/tmp/score

Insert Overwrite local directory  '/data/tmp/score'Select *  from score;

Three: Migrating Data

Create a new/data/tmp directory in another cluster

mkdir  -p/data/tmp/score

Copy data

SCP /data/tmp/score/*  [email protected]:/data/tmp/score/

View

cd/data/tmp/Scorell

Four: Create a partitioned table and a temporary table without partitions

The imported cluster is the HDP-2.7.1 release version of Hortonworks.

The partition table is our final target table, which is overused when there are no partitioned temporary tables.

Into Hive

sudo -u hdfs Hive

Create a table with partitions

CREATE TABLE score (  ID                int,  studentid       int,  score               Double  string);

Create a temporary table with no partitions

CREATE TABLE Score1 (     IDint,     int,      Double,     string
);

V: Import data into a staging table

Load ' /data/tmp/score '  into table Score1;

We'll check the incoming data.

Select *  from Score1;

VI: Import from temp table to partition table

 set  hive. exec . Dynamic.partitiontrue;  set  hive. exec . Dynamic.partition.mode=  nonstrict;  set  hive. exec . max . Dynamic.partitions.pernode=  10000   insert  overwrite Span style= "color: #0000ff;" >table  score partition (openingtime )  *  from  score1;

Inquire

Select *  from score;

We'll look at the files in the hive in HDFs.

Hadoop FS-LS -r/apps/hive/warehouse/score

Can be clearly seen according to the Openingtime partition.

Seven: Delete temporary tables

Drop Table Score1

Eight: Delete temporary data

RM -rf/data/tmp/score

This way our hive cluster data migration is over.

--------------------------------------------------------------------

Here, the content of this chapter is complete.

Series Index

"Source" Self-Learning Hadoop series index from zero

This article is copyright Mephisto and Blog Park is shared, welcome reprint, but must retain this paragraph statement, and give the original link, thank you for your cooperation.

The article is written by elder brother (Mephisto), Sourcelink

"Source" self-learning Hadoop from zero: Hive data import and export, cluster data migration

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

"Source" self-learning Hadoop from zero: Hive data import and export, cluster data migration

Contact Us

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support