[Learn hive together] Nine-hive query statement select

Source: Internet
Author: User
viii. Query Statement select for hive

In all database systems, the SELECT statement is the most used, but also the most complex piece, the query in hive Select support syntax is certainly more complex, this article only try to introduce. 8.1 Basic Query Syntax

The Select base syntax in hive is basically consistent with the standard SQL syntax, which supports where, DISTINCT, GROUP by, ORDER by, have, LIMIT, subquery, and so on;
The syntax is as follows:

[With Commontableexpression (, commontableexpression) *] SELECT [All | DISTINCT] select_expr, select_expr, ... From Table_reference [WHERE where_condition] [GROUP by col_list] [CLUSTER by col_list | [Distribute by col_list] [SORT by Col_list]] [LIMIT number]

Here are some of the more specific query syntax in hive. 8.2 ORDER by and sort by

The order by is used for global ordering, which is to globally sort all the specified sort keys, using the ORDER BY query statement, and finally using a reduce task to complete the global ordering.
Sort by is used for intra-partition sorting, which is the sort within each reduce task.
Look at the following example:
The original table data is:

Hive> select * from lxw1234_com; OK 5 3 6 2 9 8 1

Using ORDER BY

Hive> SELECT * from lxw1234_com the order by ID; 1 2 3 5 6 8 9

Use Sort by

Hive> set mapred.reduce.tasks=2;
Hive> SELECT * from lxw1234_com sort by ID;
2
5
6
9
1
3
8

Set 2 reduce, from the results can be seen, each reduce within a sort.
If the reduce number is 1, then the result of order by and sort by is the same:

Hive> set Mapred.reduce.tasks=1; Hive> SELECT * from lxw1234_com sort by ID; 1 2 3 5 6 8 9

In a real business environment, most of our requirements need to be done using order by global ordering. 8.3 Distribute by and cluster by

Distribute by: Divides the data by the specified field or expression and outputs it to the corresponding reduce or file.
Cluster by: In addition to the ability to combine distribute by, it also features sort by sorting.

# #distribute by

Hive> set mapred.reduce.tasks=2; Hive>insert Overwrite LOCAL directory '/tmp/lxw1234/' SELECT ID from the lxw1234_com distribute by ID;

After execution, the 000000_0 and 000001_0 two files are generated in the local/tmp/lxw1234 directory:
Cat 000000_0
8
2
6
Cat 000001_0 # #id%2=1 Records
1
9
3
5

# #cluster by

Hive> set mapred.reduce.tasks=2; hive> INSERT Overwrite LOCAL directory '/tmp/lxw1234/' SELECT ID from the lxw1234_com CLUSTER by ID; Cat 000000_0 2 6 8 Cat 000001_0 1 3 5 9

Note: After using cluster by, the IDs in each file are sorted, and distribute by IS not. 8.4 Sub-query

Subqueries are basically consistent with subquery syntax and usage in standard SQL, and it is important to note that if a select query is made from a subquery in hive, the subquery must set an alias.

Select col from (select A+b as col from T1) T2

Also, starting with Hive0.13, subqueries are supported in the WHERE clause, such as:

SELECT * from A WHERE a.a in (select Foo from B); Select A from T1 where EXISTS (select B from T2 where T1. X = T2. Y

There is also a syntax for querying subqueries as a table, called Common table Expression (CTE):

With Q1 as (SELECT * from src where key= ' 5 '), Q2 as (select * from src s2 where key = ' 4 ') SELECT *

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.