[Learn hive together] Nine-hive query statement select

Last Update:2018-07-26 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

viii. Query Statement select for hive

In all database systems, the SELECT statement is the most used, but also the most complex piece, the query in hive Select support syntax is certainly more complex, this article only try to introduce. 8.1 Basic Query Syntax

The Select base syntax in hive is basically consistent with the standard SQL syntax, which supports where, DISTINCT, GROUP by, ORDER by, have, LIMIT, subquery, and so on;
The syntax is as follows:

[With Commontableexpression (, commontableexpression) *] SELECT [All | DISTINCT] select_expr, select_expr, ... From Table_reference [WHERE where_condition] [GROUP by col_list] [CLUSTER by col_list | [Distribute by col_list] [SORT by Col_list]] [LIMIT number]

Here are some of the more specific query syntax in hive. 8.2 ORDER by and sort by

The order by is used for global ordering, which is to globally sort all the specified sort keys, using the ORDER BY query statement, and finally using a reduce task to complete the global ordering.
Sort by is used for intra-partition sorting, which is the sort within each reduce task.
Look at the following example:
The original table data is:

Hive> select * from lxw1234_com; OK 5 3 6 2 9 8 1

Using ORDER BY

Hive> SELECT * from lxw1234_com the order by ID; 1 2 3 5 6 8 9

Use Sort by

Hive> set mapred.reduce.tasks=2;
Hive> SELECT * from lxw1234_com sort by ID;
2
5
6
9
1
3
8

Set 2 reduce, from the results can be seen, each reduce within a sort.
If the reduce number is 1, then the result of order by and sort by is the same:

Hive> set Mapred.reduce.tasks=1; Hive> SELECT * from lxw1234_com sort by ID; 1 2 3 5 6 8 9

In a real business environment, most of our requirements need to be done using order by global ordering. 8.3 Distribute by and cluster by

Distribute by: Divides the data by the specified field or expression and outputs it to the corresponding reduce or file.
Cluster by: In addition to the ability to combine distribute by, it also features sort by sorting.

# #distribute by

Hive> set mapred.reduce.tasks=2; Hive>insert Overwrite LOCAL directory '/tmp/lxw1234/' SELECT ID from the lxw1234_com distribute by ID;

After execution, the 000000_0 and 000001_0 two files are generated in the local/tmp/lxw1234 directory:
Cat 000000_0
8
2
6
Cat 000001_0 # #id%2=1 Records
1
9
3
5

# #cluster by

Hive> set mapred.reduce.tasks=2; hive> INSERT Overwrite LOCAL directory '/tmp/lxw1234/' SELECT ID from the lxw1234_com CLUSTER by ID; Cat 000000_0 2 6 8 Cat 000001_0 1 3 5 9

Note: After using cluster by, the IDs in each file are sorted, and distribute by IS not. 8.4 Sub-query

Subqueries are basically consistent with subquery syntax and usage in standard SQL, and it is important to note that if a select query is made from a subquery in hive, the subquery must set an alias.

Select col from (select A+b as col from T1) T2

Also, starting with Hive0.13, subqueries are supported in the WHERE clause, such as:

SELECT * from A WHERE a.a in (select Foo from B); Select A from T1 where EXISTS (select B from T2 where T1. X = T2. Y

There is also a syntax for querying subqueries as a table, called Common table Expression (CTE):

With Q1 as (SELECT * from src where key= ' 5 '), Q2 as (select * from src s2 where key = ' 4 ') SELECT *

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

[Learn hive together] Nine-hive query statement select

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

[Learn hive together] Nine-hive query statement select

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support