Implement incremental update in Hive and incremental update in Hive

Source: Internet
Author: User

Implement incremental update in Hive and incremental update in Hive
An insurance company has a table that records the customer information, including the customer's id, name, and age (only these fields are listed for demonstration ).
Create a Hive table:
Create table customer
(
Id int,
Age tinyint,
Name string
)
Partitioned by (dt string)
Row format delimited
Fields terminated by '|'
Stored as textfile;


Import initialization data:
Load data local inpath '/home/hadoop/hivetestdata/customer.txt 'into table customer partition (dt = '20140901 ');
Hive> select * from customer order by id;
Customer. id customer. agecustomer. name customer. dt
1 25 jiangshouzhuang201506
2 23 zhangyun201506
3 24 yiyi201506
4 32 mengmeng201506


For insurance companies, customers change every day. We use the temporary data table customer_temp to record customer information every day. The fields and attributes are consistent with the customer table,

Create table customer_temp like customer;

Load data local inpath '/home/hadoop/hivetestdata/customer_temp.txt' into table customer_temp partition (dt = '20140901 ');

Examples of included data are as follows:

Hive> select * from customer_temp;
Customer_temp.id customer_temp.agecustomer_temp.name customer_temp.dt
1 26 jiangshouzhuang201506
5 45 xiaosan201506


To incrementally update the customer table, we need to perform full outer join on the two tables to update the modified data in the customer_temp table to the customer table.
Hive (hive)> select * from customer_temp
> Union all
> Select a. * from customer
> Left outer join customer_temp B
> On a. id = B. id where B. id is null;
_ U1.id _ u1.age _ u1.name _ u1.dt
2 23 zhangyun201506
3 24 yiyi201506
4 32 mengmeng201506
1 26 jiangshouzhuang201506
5 45 xiaosan201506


I have seen that the following method is used on the Internet, and it seems that there is a problem:
Hive> select customer. id,
Coalesce (customer_temp.age, customer. age ),
Customer. name,
Coalesce (customer_temp.dt, customer. dt)
From customer_temp
Full outer join customer on customer_temp.id = customer. id;
The execution result is:
Customer. id _ c1customer. name _ c3
1 26 jiangshouzhuang201506
2 23 zhangyun201506
3 24 yiyi201506
4 32 mengmeng201506
NULL 45 NULL 201506


We can see that there is indeed a problem.

If you have any better optimization methods, please kindly advise. Thank you.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.