Problems with hive SQL

Source: Internet
Author: User
Tags subdomain name

Record the problems encountered when using hive SQL at work and share them with you.

 

The query condition is invalid because there are too many conditions for a field.
 Select   *   From Login Where DT =  '  20130101  '   And (Ver ! =  '  3.1  '   Or Ver! =  '  3.2  '   Or Ver ! =   '  4.0  '   Or Ver ! =   '  5.2  ' );

Remarks: Thanks Under the guidance of Yang qingrong, here is a mistake. In fact, it is changed to the following SQL statement, and the function can be implemented. This bug is purely an error of the author:

 Select   *  From Login Where DT =  '  20130101  '   And (Ver ! =  '  3.1  '   And Ver ! =  '  3.2  '   And Ver ! =   '  4.0  '   And Ver ! =   '  5.2  ' );

 

The purpose of this statement is to query the login user records of versions not 20130101, 3.1, 3.2, and 4.0 on the current day of 5.2 from a user login table. However, when executed in hive0.90, all logon user records are returned on the last day of 20130101, and the query conditions later appear to be ignored. Here it looks like the same field is doing this multiple! = OR = condition query, too many will be invalid.

The statement is modified as follows:

Select * FromLoginWhereDT=20130101 And! Array_contains (split ('3'. 1, 3.2, 4.0, 5.2',','), Ver );

The statement returns the query result as expected.

 

There are more than one where statement in the word query, and only one where statement can take effect.
 Select URL, country, PV, IP From  ( Select URL, country, Count ( * ) As PV, Count ( Distinct IP) As IP From Web_site Where DT =  '  20130101  '    Group   By URL, Country Order   By URL, PV Desc  ) Order_area  Where Row_number (URL) <=   100 ;

Click the detail table web_site on a website page to query the top 20130101 regions of PV under each subdomain name URL on the seventh day. However, the first where statement where dt = '20160301' takes effect, and the second where statement where row_number (URL) <= 20130101 does not. The explain statement is used to query the task plan of the statement, and the second where statement is not executed effectively. Finally, the PV of all regions under each domain name is returned.

This bug can occur in hive 0.81, but the problem can be solved by upgrading hive to 0.91.

 

Hive accesses hbase tables through external tables and uses the insert into statement.

hive can access hbase tables by creating external tables. However, from an external table, the insert into statement is used to insert data to a hive internal table multiple times. It is expected that the data will be appended, however, data is overwritten each time.

  insert   into   table  login_user partition (dt =  '  20130101   ')  select  uid  from  pc_login_user_20130101 ;   insert   into   table  login_user partition (dt =  '  20130101   ')  select  uid  from  mobile_login_user_20130101; 

Pc_login_user_20130101 and mobile_login_user_20130101 are both external tables for hive access to hbase, representing the login users on the computer and mobile phone of 20130101 on the same day. They are expected to append the data of the login users on the mobile phone to the hi. However, in the actual process, it is found that the insert into operation is overwrite every time, so the result table only contains the login user data of the mobile phone.

This problem was found in hive0.90 + hbase0.941.

 

Problems caused by too many dynamic partitions in hive

A large number of dynamic partitions may cause the following job situations:

 

Org. apache. hadoop. hive. QL. metadata. hiveexception: Org. apache. hadoop. IPC. remoteException: Org. apache. hadoop. HDFS. server. namenode. leaseexpiredexception: no lease on/tmp/hive-maintain/hive_2012-11-28_22-39-43_810_1689858262130334284/_ task_tmp. -ext-10002/Part = 33436268/_ tmp.000004 _ 0 file does not exist. holder dfsclient_attempt_201211250925_9859_m_000004_0 does not have any open files.

 

Solution:

 

Increase the number of dynamic partitions. The settings are as follows.

 

Set hive.exe C. Max. Dynamic. partitions = 100000;
Set hive.exe C. Max. Dynamic. partitions. pernode = 100000;

 

 

An error occurred while deleting many partition tables in hive.

When you drop a table that contains many partitions, an outofmemory exception is reported and cannot be deleted in hive.

It was found to be a bug by searching for information. Is said to have been fixed in version 0.10.0, see: https://issues.apache.org/jira/browse/HIVE-2907

Solution: I understand this issue because hive's default heap size is 256 MB. When runjar is started, the heap size is set to 512 MB, which can solve this problem.

  

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.