"Source" self-learning Hadoop from zero: Hive data import and export, cluster data migration

Source: Internet
Author: User

Read Catalogue
    • Order
    • Write the results of the query to the file system
    • Cluster Data Migration One
    • Cluster Data Migration II
    • Series Index

This article is copyright Mephisto and Blog Park is shared, welcome reprint, but must retain this paragraph statement, and give the original link, thank you for your cooperation.

The article is written by elder brother (Mephisto), Sourcelink

Order

In the previous article, we introduced a variety of ways to import hive data, so that our hive has a data source, but sometimes we may need to pure export, or cluster hive data migration (different clusters, different versions), we can use these two chapters of knowledge to achieve.

Here we begin to describe the data export of hive and the migration of the cluster hive data.

Write the results of the query to the file system
One: Description

The results of the query can be written to the file system by simply modifying the import syntax from other tables in the previous article.

Two: Syntax:
Standard Syntax:INSERTOVERWRITE[LOCAL]DIRECTORY Directory1[ROW FORMAT Row_format] [STORED as File_format](Note: onlyAvailable starting withHive0.11.0)  SELECT... from ... Hive extension (multiple inserts): fromfrom_statementINSERTOVERWRITE[LOCAL]DIRECTORY directory1 select_statement1[INSERT OVERWRITE [LOCAL]DIRECTORY directory2 select_statement2] ... row_format:delimited[Fields TERMINATED by Char [escaped by Char]][COLLECTION ITEMS TERMINATED by Char]        [MAP KEYS TERMINATED by Char] [LINES TERMINATED by Char]        [NULL DEFINED as Char](Note: onlyAvailable starting withHive0.13)
Three: Write to local

If local is used, the data is written to the local

Four: Writing to the cluster

If you do not use local, the data is written to the specified HDFs, and if the full path is not written, Hadoop's configuration Entry fs.default.name (the Namenode URI) is used.

Five: Actual combat

Modify the TMP folder permissions (this is only a test, so use the maximum permissions)

chmod 777 TMP

Into Hive

sudo -u hdfs Hive

Export the score table data from the previous article to a local

Insert Overwrite local directory  '/data/tmp/score'select*   from score;

We can see that there are files in the/data/tmp/score/directory.

Cd/data/tmp/score
ll

This allows us to export the hive data locally.

Below we use the command without the local parameter to lead the Hive table data into HDFs

Insert Overwrite  directory  '/data/tmp/score'Select  * from score;

We use the LS command in HDFs to view

Hadoop FS-LS /data/tmp/score

There is only one file here, and the above is not the same, but the total content is the same, the same data above the export, and sometimes only one file. It's not a fancy place to be.

Cluster Data Migration One
One: Introduction

In the official website, we can see the export and import, the function from the Hive0.8 began to join.

Two: Export/import

The

Export command exports tables or partitions based on metadata, and the output location can be another Hadoop cluster or hive instance. Tables with partitions are supported. export meta data storage in target directory file stored in subdirectory < Span id= "nohighlight_0.14125266089104116".

The metadata store DBMS that imports the exported source and destination can be a different relational database.

Three: Export syntax
TABLE [PARTITION (part_column= "value" [, ...] ])]  to'export_target_path'
IV: Import Syntax
[[EXTERNAL]TABLE[PARTITION (part_column= "value" [, ... ] ]]] [from'source_path'[      Location ' Import_target_path ']
V: official Example

Simple Import Export

Table  to ' hdfs_exports_location/department '  from'hdfs_exports_location/department';

Rename Import Export

Table  to ' hdfs_exports_location/department '  tablefrom'hdfs_exports_location/department';

Partition Export

table Employee partition (emp_country="in", emp_state=to' Hdfs_exports_location/employee '  from'hdfs_exports_location/employee';

Partition Import

Table  to ' Hdfs_exports_location/employee '  Table Employee partition (emp_country="Us", emp_state=from'  hdfs_exports_location/employee';

Specify the import location

Table  to ' hdfs_exports_location/department '  tablefrom'hdfs_exports_location/department'           'import_target_location/department';

Import as external table

Table  to ' hdfs_exports_location/department '  tablefrom'hdfs_exports_location/department';

Cluster Data Migration II
One: Introduction

Although the official Export/import command is very powerful, but in actual use, it may be the version of the different, there will be unable to import the situation, oneself in this piece also pondering, summed up their own set of partitioned Hive table Data Migration scheme, The scheme was successfully migrated in Cloudera and Hontorworks clusters, and the hive version was inconsistent.

Two: Export data

This is used as a data source because the hive version of the Cloudera release version CDH-5.3.3 is less than 0.8.

Create a partitioned table score

Create Table score (  ID                int,  studentid       int,  score               Double   by (Openingtime string);

Import 7, August data based on the way the data was imported in the previous article

Load ' /data/tmp/score_7.txt '  into Table score PARTITION (openingtime=201507);

Refer to our above export to local or put under/data/tmp/score

Insert Overwrite local directory  '/data/tmp/score'Select *  from score;
Three: Migrating Data

Create a new/data/tmp directory in another cluster

mkdir  -p/data/tmp/score

Copy data

SCP /data/tmp/score/*  [email protected]:/data/tmp/score/

View

cd/data/tmp/Scorell

Four: Create a partitioned table and a temporary table without partitions

The imported cluster is the HDP-2.7.1 release version of Hortonworks.

The partition table is our final target table, which is overused when there are no partitioned temporary tables.

Into Hive

sudo -u hdfs Hive

Create a table with partitions

CREATE TABLE score (  ID                int,  studentid       int,  score               Double  string);

Create a temporary table with no partitions

CREATE TABLE Score1 (     IDint,     int,      Double,     string
);

V: Import data into a staging table

Load ' /data/tmp/score '  into table Score1;

We'll check the incoming data.

Select *  from Score1;

VI: Import from temp table to partition table
 set  hive. exec . Dynamic.partitiontrue;  set  hive. exec . Dynamic.partition.mode=  nonstrict;  set  hive. exec . max . Dynamic.partitions.pernode=  10000   insert  overwrite Span style= "color: #0000ff;" >table  score partition (openingtime )  *  from  score1; 

Inquire

Select *  from score;

We'll look at the files in the hive in HDFs.

Hadoop FS-LS -r/apps/hive/warehouse/score

Can be clearly seen according to the Openingtime partition.

Seven: Delete temporary tables
Drop Table Score1
Eight: Delete temporary data
RM -rf/data/tmp/score

This way our hive cluster data migration is over.

--------------------------------------------------------------------

Here, the content of this chapter is complete.

Series Index

"Source" Self-Learning Hadoop series index from zero

This article is copyright Mephisto and Blog Park is shared, welcome reprint, but must retain this paragraph statement, and give the original link, thank you for your cooperation.

The article is written by elder brother (Mephisto), Sourcelink

"Source" self-learning Hadoop from zero: Hive data import and export, cluster data migration

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.