hive-Table Connection
1) hive supports only equivalent connections, that is, the ON clause uses an equal sign connection and does not support non-equivalent connections.
2) If there is a WHERE clause in the connection statement, the JOIN clause is executed before the WHERE clause
3) Testing
Suppose you have the following test data
Table user data is as follows
user_id |
Name |
1 |
Tom |
2 |
John doe |
3 |
Harry |
The table job data is as follows
job_id |
Job |
user_id |
1 |
Engineer |
1 |
2 |
Art |
2 |
3 |
Art |
4 |
The build table statement is as follows
CREATE TABLE IF not EXISTS user (ID int., name string) ROW formatdelimited fields TERMINATED by ' \ t ';
CREATE TABLE IF not EXISTS job (id int, job string, user_id Int.) Rowformat delimited fields TERMINATED by ' \ t ';
Internal connection
An inner connection refers to a data query that conforms to the conditions of the two sides of the connection.
Execute the following statement
SELECT * from the user join job on user.id=job.user_id;
The results are as follows
Left outer connection
Left outer connection: left side, right if there is no data, then empty.
Execute the following statement
SELECT * from the user left outer join job on user.id=job.user_id;
You cannot use a let join, only the left outerjoin. The results are as follows
Right outer connection
Execute the following statement
SELECT * from the user right outer join job on user.id=job.user_id;
The results are as follows
Full outer connection
Execute the following statement
SELECT * from the user full outer join job on user.id=job.user_id;
The results are as follows
Left Half Connection
Executes the following statement, where the left half is used instead of the in operation or the exists operation.
SELECT * from the user left semi join job on user.id=job.user_id;
The results are as follows
The statement is equivalent to the following statement
SELECT * from user where ID in (select user_id from Job);
However, Hive does not support the in clause. So you can only work with the left semi clause.
Join in Hive