Hive Table Connection

Source: Internet
Author: User
Tags null null

The table join operations in hive are broadly divided into the following four types:

1:join is an equivalent connection, and only one value exists in two tables to be retrieved.

2:The LEFT outer join is an outer join, and the value in the left table is output regardless of whether it exists in the right table, but the records in the right table are output only if they exist in the left table.

3:right outer join IS left OUTER JOIN, and the value in the right table will be output regardless of whether it exists, but the records in the left table are output only if they exist in the right (the opposite of outer join).

4:left Semi join, which is similar to exits.

Here are some examples of how these connections work:

#user表中的数据如下:

[Java]View PlainCopy
    1. Hive (Hive) > select * from User;
    2. Ok
    3. ID Name
    4. 1 Lavimer
    5. 2 Liaozhongmin
    6. 3 Liaozemin
    7. Time taken: 0.112 seconds


#post表中的数据如下:

[Java]View PlainCopy
  1. Hive (Hive) > select * from Post;
  2. Ok
  3. UID PID title
  4. 1 1 thinking in Java
  5. 1 2 thinking in Hadoop
  6. 2 3 thinking in C
  7. 4 4 Thinking in Hive
  8. 5 5 Thinking in HBase
  9. 5 6 Thinking in Pig
  10. 5 7 Thinking in Flume
  11. Time taken: 0.11 seconds


One: Equivalent connection

[Java]View PlainCopy
    1. Hive (Hive) > select S.id,s.name,t.pid,t.title from
    2. > (select Id,name from user) s
    3. > Join
    4. > (select Uid,pid,title from Post) t
    5. > on S.id=t.uid;

The results of the query are as follows:

[Java]View PlainCopy
    1. ID name PID Title
    2. 1 Lavimer 1 thinking in Java
    3. 1 Lavimer 2 thinking in Hadoop
    4. 2 Liaozhongmin 3 thinking in C



Two: Left outer connection

[Java]View PlainCopy
    1. Hive (Hive) >
    2. > select S.id,s.name,t.pid,t.title from
    3. > (select Id,name from user) s
    4. > LEFT OUTER JOIN
    5. > (select Uid,pid,title from Post) t
    6. > on S.id=t.uid;

The results of the query are as follows:

[Java]View PlainCopy
    1. ID name PID Title
    2. 1 Lavimer 1 thinking in Java
    3. 1 Lavimer 2 thinking in Hadoop
    4. 2 Liaozhongmin 3 thinking in C
    5. 3 liaozemin NULL NULL

Note: From the above results it can be seen that the UID of the post table will only output a record if it exists in the user table, otherwise the output is null.

Three: Right outer connection

[Java]View PlainCopy
    1. Hive (Hive) > select S.id,s.name,t.pid,t.title from
    2. > (select Id,name from user) s
    3. > Right OUTER JOIN
    4. > (select Uid,pid,title from Post) t
    5. > on S.id=t.uid;

The results of the query are as follows:

[Java]View PlainCopy
  1. ID name PID Title
  2. 1 Lavimer 1 thinking in Java
  3. 1 Lavimer 2 thinking in Hadoop
  4. 2 Liaozhongmin 3 thinking in C
  5. Null NULL 4 thinking in Hive
  6. Null NULL 5 thinking in HBase
  7. Null NULL 6 thinking in Pig
  8. Null NULL 7 thinking in Flume

Note: As can be seen from the above results, the ID in the user table will only output a record if it exists in the post table, otherwise the output is null.


Four: Left half connection

[Java]View PlainCopy
    1. Hive (Hive) > select S.id,s.name from
    2. > (select Id,name from user) s
    3. > Left Semi Join
    4. > (select Uid,pid,title from Post) t
    5. > on S.id=t.uid;

The results of the query are as follows:

[Java]View PlainCopy
    1. ID Name
    2. 1 Lavimer
    3. 2 Liaozhongmin

This left semi join is interesting because there is no in/exits clause in hive, but we do need to do so, so hive converts this type of clause to a semi join.

The above statement can be abbreviated to the following form:

[Java]View PlainCopy
    1. Hive (Hive) > select user.id,user.name from User
    2. > Left Semi Join
    3. > Post
    4. > On (USER.ID=POST.UID);



The HQL statement above is equivalent to the following SQL statement:

[Java]View PlainCopy
      1. Select Id,name from user
      2. where ID in
      3. (select UID from post);

Hive Table Connection

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.