Hive implements Oracle's minus function

Source: Internet
Author: User

The main function of the minus operation in Oracle is to return records that are not the same as the second table/query results in the first table/query results when two tables or two query results are performed.

records with different results are in two cases: The contents of a row in a /b table are different and the data in table A does not exist in table B. In summary, the data for table A is returned.

there is no function to implement minus function in Hive , it has to be implemented in step.

First, find out the data in table A that does not exist in table B

Insert Overwrite table Tmp_a partition (name= 'a_innot_b ')

Select

A.*

From a a left outer joins b b on (a.id = b.id) where b.id is NULL;

Ii. find data that exists in both a and B tables, but with different contents

The UDF functions are as follows:

Public class Minus extends udf{

String a="";

String B="";

Public Text Evaluate (String ... strs) {

for (int i=0;i<strs. length/2;i++) {

a=a+strs[i];

}

for (int i=strs. length/2;i<strs. length; i++) {

b=b+strs[i];

}

if (a. Replace ("", ""). Equals (b. Replace ("", ""))) {

return New Text ("");

}Else{

return New Text (Strs[0].replace ("", ""));

}

}

The corresponding query is as follows:

Insert Overwrite table Tmp_a_diff

Select Iminus (

a.*,b.*

) from a a joins B b on (a.id=b.id);

Tmp_a_diff Stores the ID and some ""of data that are present in both a and B tables, but with different contents.

get each row of data by ID

Insert Overwrite table tmp_a partition (name= "A_in_b")

Select A.*

From Tmp_a_diff B joins a A on (a.id=b.id);

the data in the tmp_a partition a_innot_b and partition A_in_b is now in Oracle ( SELECT * from A)minus data (SELECT * from B) .


Hive implements Oracle's minus function

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.