The main function of the minus operation in Oracle is to return records that are not the same as the second table/query results in the first table/query results when two tables or two query results are performed.
records with different results are in two cases: The contents of a row in a /b table are different and the data in table A does not exist in table B. In summary, the data for table A is returned.
there is no function to implement minus function in Hive , it has to be implemented in step.
First, find out the data in table A that does not exist in table B
Insert Overwrite table Tmp_a partition (name= 'a_innot_b ')
Select
A.*
From a a left outer joins b b on (a.id = b.id) where b.id is NULL;
Ii. find data that exists in both a and B tables, but with different contents
The UDF functions are as follows:
Public class Minus extends udf{
String a="";
String B="";
Public Text Evaluate (String ... strs) {
for (int i=0;i<strs. length/2;i++) {
a=a+strs[i];
}
for (int i=strs. length/2;i<strs. length; i++) {
b=b+strs[i];
}
if (a. Replace ("", ""). Equals (b. Replace ("", ""))) {
return New Text ("");
}Else{
return New Text (Strs[0].replace ("", ""));
}
}
The corresponding query is as follows:
Insert Overwrite table Tmp_a_diff
Select Iminus (
a.*,b.*
) from a a joins B b on (a.id=b.id);
Tmp_a_diff Stores the ID and some ""of data that are present in both a and B tables, but with different contents.
get each row of data by ID
Insert Overwrite table tmp_a partition (name= "A_in_b")
Select A.*
From Tmp_a_diff B joins a A on (a.id=b.id);
the data in the tmp_a partition a_innot_b and partition A_in_b is now in Oracle ( SELECT * from A)minus data (SELECT * from B) .
Hive implements Oracle's minus function