Transferred from: http://lxw1234.com/archives/2015/09/484.htm
Keywords: Hive replication table
There are times when the need to replicate tables is encountered in hive, which refers to duplicating table structures and data.
If it is for a non-partitioned table, it is easy to use the CREATE TABLE new_table as SELECT * from old_table;
So what if it's a partitioned table?
The first way to think of it might be:
First create a new_table with the same old_table structure, including the partition; You can use the CREATE table new_table like old_table;
Next, use dynamic partitioning to insert the old_table data into the new_table.
This method is certainly possible, but it may not be the quickest.
Can actually do this:
1. CREATE TABLE new_table like old_table;
2. Use the Hadoop FS-CP command to copy the folder of the old_table corresponding HDFs directory to the new_table corresponding HDFs directory;
3. Use msck REPAIR table new_table; fix partition metadata for new table;
See Example:
There is a partition table T1, only two partitions, each of which has one data, as follows:
- Hive> show Partitions t1;
- Ok
- PT=----
- PT=----
- Time taken: 0.11 seconds, fetched: 2 row(s)
- Hive> desc t1;
- Ok
- ID String
- PT String
- # Partition Information
- # col_name Data_type Comment
- PT String
- Time taken: 0.123 seconds, fetched: 7 Row(s)
- Hive> Select * from t1;
- Ok
- X -one- One
- Y ----
- Time taken: 0.095 seconds, fetched: 2 row(s )
- Hive>
Create a new table with the same table structure T2;
- Hive> CREATE table t2 like T1;
- Ok
- Time taken: 0.162 seconds
- Hive> desc T2;
- Ok
- ID String
- PT String
- # Partition Information
- # col_name Data_type Comment
- PT String
- Time taken: 0.139 seconds, fetched: 7 Row(s)
- Hive> show Partitions t2;
- Ok
- Time taken: 0.082 seconds
Use the Hadoop fs-cp command to copy all the folders T1 corresponding to the HDFs directory to the T2 corresponding HDFs directory:
- [[email protected] ~]$ Hadoop FS -CP /hivedata/warehouse/ Liuxiaowen. DB/T1/*/hivedata/warehouse/liuxiaowen.db/t2/
- [Email protected] ~]$ Hadoop fs-ls/hivedata/warehouse/liuxiaowen.db/t2/
- Found 2 Items
- Drwxr-xr-x-Liuxiaowen Liuxiaowen 0 2015-09-11 17:17/hivedata/warehouse/liuxiaowen.db/t2/pt=2015-09-11
- Drwxr-xr-x-Liuxiaowen Liuxiaowen 0 2015-09-11 17:17/hivedata/warehouse/liuxiaowen.db/t2/pt=2015-09-12
Use msck REPAIR table T2 in hive to fix partition metadata for new table T2;
- Hive> show Partitions t2;
- Ok
- Time taken: 0.082 seconds
- Hive> msck REPAIR TABLE T2;
- Ok
- Partitions not in metastore: T2:pt=--- One T2:pt=----
- Repair: Added partition to Metastore T2:pt=- ---
- Repair: Added partition to Metastore T2:pt=12
- Time taken: 0.249 seconds, fetched: 3 row(s)
- Hive> show Partitions t2;
- Ok
- PT=----
- PT=----
- Time taken: 0.068 seconds, fetched: 2 row(s )
- Hive> Select * from T2;
- Ok
- X -one- One
- Y ----
- Time taken: 0.123 seconds, fetched: 2 row(s )
- Hive>
OK, the new table T2 has been copied, it and T1 have the same table structure, partition structure, partition and data.
How to quickly copy a partitioned table (including data) in Hive