Topic Center

Contact Sales

Home > Others

Handling of the hive configuration file and Null values in join

Last Update:2016-03-26 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

First, the parameter setting of hive

1. Three ways to set up: Configuration file

· User-defined profile: $HIVE _conf_dir/hive-site.xml

· Default profile: $HIVE _conf_dir/hive-default.xml

The user-defined configuration overrides the default configuration.

In addition, hive is read into the Hadoop configuration, and since Hive is started as a client of Hadoop, the Hadoop configuration file contains

· $HADOOP _conf_dir/hive-site.xml

· $HADOOP _conf_dir/hive-default.xml

The configuration of hive overrides the configuration of Hadoop.

Configuration file settings are valid for all hive processes that are natively started.

2. Command-line parameters,

Bin/hive-hiveconf Hive.root.logger=info,console

This setting is for the start session (for Server mode startup. Sessions) is valid for all requests.

3. Statement of parameters

Set mapred.reduce.tasks=100;

The scope of this setting is also the session level

Ii. Where to use hive some attention

1. The character set used by hive is UTF-8 by default. There is no such function in hive that converts character encodings

Hive.exec.compress.output This parameter, the default is False.

But most of the time it seems to be explicitly set individually. Otherwise it will compress the result, assuming that your file will be directly behind Hadoop, then you cannot compress the

2. Semantic differences in handling null values in join

The special logic here is that, in the join of Hive, the field of the Joinkey is compared. The null=null is meaningful. And the return value is true. Check the following query:

Select U.uid, COUNT (U.uid)

From T_weblog L joins T_user u on (l.uid = u.uid) GroupBy u.uid;

In the query, a record with a null UID in the T_weblog table will be connected to a record with an empty UID in the T_user table. That is L.uid = U.uid=null was established.

Assumptions need to be consistent with the semantics of the standard. We need to rewrite the case where the query manually filters for null values:

Select U.uid, COUNT (U.uid)

From T_weblog l Join T_user u

On (L.uid = U.uid and l.uid are NOT null and U.uid is Notnull)

Group BY U.uid;

In practice, this semantic difference is also one of the reasons that often leads to data skew.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

Related Keywords:

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

What's Trending

Top 10 Tags

datastax versions naming convention zookeeper client class definition md5 microsoft sql server 2005 data structures exception handling error handling

Top 10 Keywords

microsoft download center down wordpress address url site address url wordpress address url windows installer 4 0 download 302 not found web address url definition site address url wordpress db2 integer mac os installation step by step pdf abbreviation for return

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Handling of the hive configuration file and Null values in join

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support