How to quickly copy a partitioned table (including data) in Hive

Source: Internet
Author: User
Tags hadoop fs

Transferred from: http://lxw1234.com/archives/2015/09/484.htm

Keywords: Hive replication table

There are times when the need to replicate tables is encountered in hive, which refers to duplicating table structures and data.

If it is for a non-partitioned table, it is easy to use the CREATE TABLE new_table as SELECT * from old_table;

So what if it's a partitioned table?

The first way to think of it might be:

First create a new_table with the same old_table structure, including the partition; You can use the CREATE table new_table like old_table;

Next, use dynamic partitioning to insert the old_table data into the new_table.

This method is certainly possible, but it may not be the quickest.

Can actually do this:

1. CREATE TABLE new_table like old_table;

2. Use the Hadoop FS-CP command to copy the folder of the old_table corresponding HDFs directory to the new_table corresponding HDFs directory;

3. Use msck REPAIR table new_table; fix partition metadata for new table;

See Example:

There is a partition table T1, only two partitions, each of which has one data, as follows:

  1. Hive> show Partitions t1;
  2. Ok
  3. PT=----
  4. PT=----
  5. Time taken: 0.11 seconds, fetched: 2 row(s)
  6. Hive> desc t1;
  7. Ok
  8. ID String
  9. PT String
  10. # Partition Information
  11. # col_name Data_type Comment
  12. PT String
  13. Time taken: 0.123 seconds, fetched: 7 Row(s)
  14. Hive> Select * from t1;
  15. Ok
  16. X -one- One
  17. Y ----
  18. Time taken: 0.095 seconds, fetched: 2 row(s )
  19. Hive>

Create a new table with the same table structure T2;

  1. Hive> CREATE table t2 like T1;
  2. Ok
  3. Time taken: 0.162 seconds
  4. Hive> desc T2;
  5. Ok
  6. ID String
  7. PT String
  8. # Partition Information
  9. # col_name Data_type Comment
  10. PT String
  11. Time taken: 0.139 seconds, fetched: 7 Row(s)
  12. Hive> show Partitions t2;
  13. Ok
  14. Time taken: 0.082 seconds

Use the Hadoop fs-cp command to copy all the folders T1 corresponding to the HDFs directory to the T2 corresponding HDFs directory:

  1. [[email protected] ~]$ Hadoop FS -CP /hivedata/warehouse/ Liuxiaowen. DB/T1/*/hivedata/warehouse/liuxiaowen.db/t2/
  2. [Email protected] ~]$ Hadoop fs-ls/hivedata/warehouse/liuxiaowen.db/t2/
  3. Found 2 Items
  4. Drwxr-xr-x-Liuxiaowen Liuxiaowen 0 2015-09-11 17:17/hivedata/warehouse/liuxiaowen.db/t2/pt=2015-09-11
  5. Drwxr-xr-x-Liuxiaowen Liuxiaowen 0 2015-09-11 17:17/hivedata/warehouse/liuxiaowen.db/t2/pt=2015-09-12

Use msck REPAIR table T2 in hive to fix partition metadata for new table T2;

  1. Hive> show Partitions t2;
  2. Ok
  3. Time taken: 0.082 seconds
  4. Hive> msck REPAIR TABLE T2;
  5. Ok
  6. Partitions not in metastore: T2:pt=--- One T2:pt=----
  7. Repair: Added partition to Metastore T2:pt=- ---
  8. Repair: Added partition to Metastore T2:pt=12
  9. Time taken: 0.249 seconds, fetched: 3 row(s)
  10. Hive> show Partitions t2;
  11. Ok
  12. PT=----
  13. PT=----
  14. Time taken: 0.068 seconds, fetched: 2 row(s )
  15. Hive> Select * from T2;
  16. Ok
  17. X -one- One
  18. Y ----
  19. Time taken: 0.123 seconds, fetched: 2 row(s )
  20. Hive>

OK, the new table T2 has been copied, it and T1 have the same table structure, partition structure, partition and data.

How to quickly copy a partitioned table (including data) in Hive

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.