Implementing incremental updates in hive

Source: Internet
Author: User

The insurance company has a table that records the customer's information, which includes the customer's id,name and age (to demonstrate that only these fields are listed).
To create a hive table:
CREATE TABLE Customer
(
ID int,
Age tinyint,
Name string
)
Partitioned by (DT string)
Row format delimited
Fields terminated by ' | '
stored as textfile;


Import Initialization data:
Load data local inpath '/home/hadoop/hivetestdata/customer.txt ' into table customer partition (dt = ' 201506 ');
Hive> SELECT * from the customer order by ID;
Customer.id Customer.ageCustomer.name Customer.dt
1 25 Jiangshouzhuang201506
2 23 Zhangyun201506
3 24 Yiyi201506
4 32 Mengmeng201506


For insurers, customers change every day, we use temporary data sheet customer_temp to record daily customer information, and the fields and attributes are consistent with the Customer table.

CREATE table customer_temp like customer;

Load data local inpath '/home/hadoop/hivetestdata/customer_temp.txt ' into table customer_temp partition (dt = ' 201506 ');

The included data examples are as follows:

Hive> select * from Customer_temp;
Customer_temp.id Customer_temp.ageCustomer_temp.name Customer_temp.dt
1 26 Jiangshouzhuang201506
5 45 Xiaosan201506


If you need to implement an incremental update of the Customer table, we need to perform a full outer join of the two tables and update the modified data in the Customer_temp table to the Customer table.
Hive (Hive) > select * from Customer_temp
> UNION ALL
> select a.* from Customer a
> LEFT OUTER join customer_temp b
> On a.id = b.id where b.id is null;
_u1.id _u1.age_u1.name _u1.dt
2 23 Zhangyun201506
3 24 Yiyi201506
4 32 Mengmeng201506
1 26 Jiangshouzhuang201506
5 45 Xiaosan201506


Before you see online there is a problem with the use of a method similar to the following:
Hive> Select Customer.id,
COALESCE (Customer_temp.age,customer.age),
Customer.name,
COALESCE (CUSTOMER_TEMP.DT,CUSTOMER.DT)
From Customer_temp
Full outer JOIN customer on customer_temp.id = Customer.id;
The result after execution is:
Customer.id _c1Customer.name _c3
1 26 Jiangshouzhuang201506
2 23 Zhangyun201506
3 24 Yiyi201506
4 32 Mengmeng201506
Null 45 Null 201506


It can be seen that there is a problem.

If friends have better methods of optimization please enlighten, thank you.

Implementing incremental updates in hive

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.