The table join operations in hive are broadly divided into the following four types:
1:join is an equivalent connection, and only one value exists in two tables to be retrieved.
2:The LEFT outer join is an outer join, and the value in the left table is output regardless of whether it exists in the right table, but the records in the right table are output only if they exist in the left table.
3:right outer join IS left OUTER JOIN, and the value in the right table will be output regardless of whether it exists, but the records in the left table are output only if they exist in the right (the opposite of outer join).
4:left Semi join, which is similar to exits.
Here are some examples of how these connections work:
#user表中的数据如下:
[Java]View PlainCopy
- Hive (Hive) > select * from User;
- Ok
- ID Name
- 1 Lavimer
- 2 Liaozhongmin
- 3 Liaozemin
- Time taken: 0.112 seconds
#post表中的数据如下:
[Java]View PlainCopy
- Hive (Hive) > select * from Post;
- Ok
- UID PID title
- 1 1 thinking in Java
- 1 2 thinking in Hadoop
- 2 3 thinking in C
- 4 4 Thinking in Hive
- 5 5 Thinking in HBase
- 5 6 Thinking in Pig
- 5 7 Thinking in Flume
- Time taken: 0.11 seconds
One: Equivalent connection
[Java]View PlainCopy
- Hive (Hive) > select S.id,s.name,t.pid,t.title from
- > (select Id,name from user) s
- > Join
- > (select Uid,pid,title from Post) t
- > on S.id=t.uid;
The results of the query are as follows:
[Java]View PlainCopy
- ID name PID Title
- 1 Lavimer 1 thinking in Java
- 1 Lavimer 2 thinking in Hadoop
- 2 Liaozhongmin 3 thinking in C
Two: Left outer connection
[Java]View PlainCopy
- Hive (Hive) >
- > select S.id,s.name,t.pid,t.title from
- > (select Id,name from user) s
- > LEFT OUTER JOIN
- > (select Uid,pid,title from Post) t
- > on S.id=t.uid;
The results of the query are as follows:
[Java]View PlainCopy
- ID name PID Title
- 1 Lavimer 1 thinking in Java
- 1 Lavimer 2 thinking in Hadoop
- 2 Liaozhongmin 3 thinking in C
- 3 liaozemin NULL NULL
Note: From the above results it can be seen that the UID of the post table will only output a record if it exists in the user table, otherwise the output is null.
Three: Right outer connection
[Java]View PlainCopy
- Hive (Hive) > select S.id,s.name,t.pid,t.title from
- > (select Id,name from user) s
- > Right OUTER JOIN
- > (select Uid,pid,title from Post) t
- > on S.id=t.uid;
The results of the query are as follows:
[Java]View PlainCopy
- ID name PID Title
- 1 Lavimer 1 thinking in Java
- 1 Lavimer 2 thinking in Hadoop
- 2 Liaozhongmin 3 thinking in C
- Null NULL 4 thinking in Hive
- Null NULL 5 thinking in HBase
- Null NULL 6 thinking in Pig
- Null NULL 7 thinking in Flume
Note: As can be seen from the above results, the ID in the user table will only output a record if it exists in the post table, otherwise the output is null.
Four: Left half connection
[Java]View PlainCopy
- Hive (Hive) > select S.id,s.name from
- > (select Id,name from user) s
- > Left Semi Join
- > (select Uid,pid,title from Post) t
- > on S.id=t.uid;
The results of the query are as follows:
[Java]View PlainCopy
- ID Name
- 1 Lavimer
- 2 Liaozhongmin
This left semi join is interesting because there is no in/exits clause in hive, but we do need to do so, so hive converts this type of clause to a semi join.
The above statement can be abbreviated to the following form:
[Java]View PlainCopy
- Hive (Hive) > select user.id,user.name from User
- > Left Semi Join
- > Post
- > On (USER.ID=POST.UID);
The HQL statement above is equivalent to the following SQL statement:
[Java]View PlainCopy
- Select Id,name from user
- where ID in
- (select UID from post);
Hive Table Connection