Optimization of hive separator and orderby sort by distribute

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

1. Hive semicolon

A semicolon is the end mark of an SQL statement. It is also used in hiveql. However, in hiveql, the semicolon recognition is not so intelligent. For example:

Select Concat (cookie_id, Concat (';', 'zoo') fromc02_clickstat_fatdt1 limit 2;

Failed: Parse error: line 0:-1 cannot recognize input '<EOF>' in Function Specification

It can be inferred that when parsing a hive statement, a semicolon is deemed to end the statement, regardless of whether it is enclosed in quotation marks.

The solution is to use the ASCII code of the semicolon octal to escape, then the above statement should be written:

Select Concat (cookie_id, Concat ('\ 073', 'zoo') fromc02_clickstat_fatdt1 limit 2;

Why is it an octal ASCII code?

I tried to use a hexadecimal ASCII code, but hive treats it as a string and does not escape it. It seems that only octal characters are supported. The reason is unknown. This rule also applies to other non-select statements. For example, if you need to define a separator in create table, you must use the octal ASCII code to escape the non-printable characters.

Ii. Insert new data

The "Overwrite" keyword must be added according to the syntax insert, that is, each insert is overwritten. How can we add data to a table?

Suppose there is a table manbu in hive,

Hive> describe manbu;

Id int

Value int

Hive> select * From manbu;

3 4

1 2

2 3

Add a record:

Hive> insert overwrite table manbu

Select ID, value from (

Select ID, value from manbu

Union all

Select 4 as ID, 5 as value from manbu limit 1

) U;

The result is:

Hive> select * from P1;

3 4

4 5

2 3

1 2

The key lies in the application of the keyword 'Union all', which combines the original dataset with the new dataset and then overwrites the table.

Iii. Initial Values

When inserting data in the insert overwrite table, the initial values of the following fields should be consistent with those in the table definition. For example, when a string type field is initially NULL:

Null as field_name // This may be prompted to define the type as string, but here it is void

Cast (null as string) as field_name // This is correct

For example, when a field of the bigint type is initially 0:

Cast (0 as bigint) as field_name

Iv. Optimization of orderby sort by distribute

The sorting keyword of hive is sort by, which is intentionally different from the order by of traditional databases to emphasize the difference between the two-sort by can only be sorted within the Single Machine range.

For example:

Set mapred. Reduce. Tasks = 2; (set the reduce quantity to 2)

Original Value:

1. selectcookie_id, page_id, ID from c02_clickstat_fatdt1

Where cookie_idin ('1. 193.131.218.1288611279693.0 ', '1. 193.148.164.1288609861509.2 ')

1.193.148.164.1288609861509.2 113181412886099008861288609901078194082403 684000005

1.193.148.164.1288609861509.2 127001128860563972141288609859828580660473 684000015

1.193.148.164.1288609861509.2 113181412886099165721288609915890452725326 684000018

1.193.131.218.1288611279693.0 01c183da6e4bc50712881288611540109914561053 684000114

1.193.131.218.1288611279693.0 01c183da6e4bc22412881288611414343558272134 684000118

1. 193.131.218.1288611279693.0 01c183da6e4bc50712881288611511781996667988 684000121

1.193.131.218.1288611279693.0 01c183da6e4bc22412881288611523640691739999 684000126

1.193.131.218.1288611279693.0 01c183da6e4bc50712881288611540109914561053 684000128

2. selectcookie_id, page_id, ID from c02_clickstat_fatdt1 where

Cookie_idin ('1. 193.131.218.1288611279693.0 ', '1. 193.148.164.1288609861509.2 ')

Sort bycookie_id, page_id;

Value after sorting by sort

1.193.131.218.1288611279693.0 684000118 01c183da6e4bc22412881288611414343558272134 684000118

1.193.131.218.1288611279693.0 684000114 01c183da6e4bc50712881288611540109914561053 684000114

1.193.131.218.1288611279693.0 684000128 01c183da6e4bc50712881288611540109914561053 684000128

1. 193.148.164.1288609861509.2 684000005 113181412886099008861288609901078194082403 684000005

1. 193.148.164.1288609861509.2 684000018 113181412886099165721288609915890452725326 684000018

1.193.131.218.1288611279693.0 684000126 01c183da6e4bc22412881288611523640691739999 684000126

1.193.131.218.1288611279693.0 684000121 01c183da6e4bc50712881288611511781996667988 684000121

1. 193.148.164.1288609861509.2 684000015 127001128860563972141288609859828580660473 684000015

Selectcookie_id, page_id, ID from c02_clickstat_fatdt1

Where cookie_idin ('1. 193.131.218.1288611279693.0 ', '1. 193.148.164.1288609861509.2 ')

Order bypage_id, cookie_id;