Union and join is the need to unite more than one table common related words, the specific concept I do not say, want to know online search on the line, because I also remember inaccurate.
First of all, the difference: union the operation of two tables is to combine the number of data bars, which is equal to the portrait, which requires that both table fields must be the same (Schema of both sides of Union should match.). In other words, if there are three data in table A and there are two data in table B, then a union B will have five data. Explain the difference between Union and union all, and if the same data records are merged for Union, and union all does not merge the same data records, how many records there will be. For example, execute the following statement under MySQL:
SELECT * from Tmp_libingxue_a;
Name Number
Libingxue 1001
Yuwen 1002
select * from Tmp_libingxue_b;
Name Number
Libingxue 1001
Feiyao 1003
SELECT * FROM tmp_libingxue_a Union SELECT * FROM tmp_libingxue_b;
libingxue 1001
Yuwen 1002
Feiyao 1003
SELECT * FROM tmp_libingxue_a UNION ALL SELECT * FROM Tmp_libingxue_ b;
Libingxue 1001
Yuwen 1002
libingxue 1001 Feiyao
1003
However, this is not possible in the hive and executes the SELECT * FROM tmp_libingxue_a UNION ALL SELECT * from Tmp_libingxue_b; failed,hive the Union must be in a subquery. Such as
SELECT * FROM (SELECT * To tmp_yuwen_a UNION ALL SELECT * Tmp_yuwen_b) t1;
Note that it must be union all, and alone with union it will prompt you to lack all, and the following T1 must be written, you can write a or B, but be sure to write, do not write will be wrong.
The join is biased to the horizontal union, only in favor of, and so on detailed description. A join is looser than a union, a two-table field is not required, a join with no restrictions is equal to the Cartesian product of two tables, all joins need to be constrained, and a restricted join is horizontal expansion. Joins that meet the constraints are extracted and are not satisfied to be filtered directly. The usage can be very flexible, and here are two simple examples:
SELECT * FROM (SELECT * to tmp_yuwen_a) T1 join (SELECT * from Tmp_yuwen_b) T2;
SELECT * from tmp_yuwen_a T1 join (SELECT * from Tmp_yuwen_b) T2;
The left outer join is similar to the right outer join usage, except that the left-hand outer join selects all the fields from the table on the other side, and the fields in the right-hand table select the criteria, and all the unsatisfied empty, that is, the table on the left. The right outer join is the same as a reference to the right-hand table. The difference between these three joins has been repeated many times, and there are more detailed explanations on the web, not to mention.
The same point: In certain cases, you can use join to implement the Union ALL function, this condition is conditional, when this happens, choose union All or group by can see the situation or look at the consumption of both the decision. SQL although in so few key words, but changeable, powerful, as long as you can achieve the desired function, how to use whatever you want. Requirements situation SQL Simple reproduce the following
drop table Tmp_libingxue_resource Create external table if not exists Urce (user_id string, shop_id string, auction_id string, search_time string) partitioned by (PT string) row form
At delimited fields terminated by ' \ t ' lines terminated by ' \ n ' stored as sequencefile;
drop table Tmp_libingxue_result; Create external table if not exists Tmp_libingxue_result (user_id string, shop_id string, auction_id string, sear Ch_time string) Partitioned by (PT string) row format delimited fields terminated by ' \ t ' lines terminated by ' \ n ' stored
As Sequencefile;
Insert Overwrite table Tmp_libingxue_result where (pt=20041104) select * from Tmp_libingxue_resource;
Sudo-u Taobao Hadoop dfs-rmr/group/tbads/warehouse/tmp_libingxue_result/pt=20041104
sudo-u Taobao Hadoop Jar/hom E/taobao/dataqa/framework/dailyreport.jar Com.alimama.loganalyzer.tool.SeqFileLoader Tmp_libingxue_resource.txt hdfs://v039182.sqa.cm4:54310/group/tbads/warehouse/tmp_libingxue_result/pt=20041104/part-00000
Hive> select * from Tmp_libingxue_resource;
OK
2001 0 20041104
2002 0 102 20041104
Hive> select * from Tmp_libingxue_result;
OK
2001 0 20041104
2002 0 20041104
Select User_id,shop_id,max (auction_id), Max (Search_time)
from
(SELECT * FROM Tmp_libingxue_resource
UNION ALL
SELECT * from tmp_libingxue_result) T1
Group by user_id,shop_id;
Select T1.user_id,t1.shop_id,t2.auction_id,t2.search_time
from
(SELECT * from Tmp_libingxue_resource) t1
Join
(SELECT * from Tmp_libingxue_result) T2 on
t1.user_id=t2.user_id and t1.shop_id=t2.shop_id;
2001 0
2002 0 104
With the preceding introduction, using union to work with the result set of a table and to connect multiple tables using joins, the two are fundamentally different.
An example of an operation using the union operator to connect two table records is given below.
A typical union operation for two-table records
Assume that there are two tables Table3 and Table4, and that the columns and data that they contain are shown below.
Table1 database table
Table2 database table
Table1 tables and Table2 tables have the same column structure, so you can use the union operator to connect a two-table recordset, and the resulting connection results are shown in the following table.
Use Union to connect Table3 table and Table4 table records
The implementation code for the above connection process can be expressed as follows:
SELECT * from
Table1
UNION
select *
Table2