"Programming Hive" Reading notes (two) Hive basics

Source: Internet
Author: User


"Programming Hive" Reading notes (two) Hive basics

: first read is browse. Build knowledge index, because some knowledge may not be able to use, know is good. The parts of interest can be studied more.

After the use of the time to look specifically. and combined with other materials.



Chapter 3.Data Types and File Formats

Raw data types and collection data types

Select out of data, the delimiter between columns and column can be specified


Chapter 4.hiveql:data Definition

creating databases, creating and modifying tables, partitioning operations


Chapter 5.hiveql:data Manipulation

1 Loading data and exporting data should be available both locally and HDFs.

2 Creating a table and inserting query results into a table

Chapter 6.hiveql:queries Select various grammars, join,cluster by etc.

where supports regular form like,rlike

Join:on conditions do not support an equal sign, do not support or

1 join then where, filter a batch of data based on join, then filter a batch by where

2 the partition filters is ignored for OUTER joints. However, Usingsuch filter predicates in on clauses for inner joins does work!

In the outer join. It is useless to write the partition condition on the on. If you want to raise speed by partitioning conditions, you can join by sub-query method.

Inner join,left outer join,right outerjoin,left Semi Join (in function, but seemingly high version number of hive support in subqueries)

Hive > SELECT * from stocks JOIN Dividends

> WHERE stock. symbol = dividends. symbol and stock. symbol = ' AAPL ';

In hive, this SQL will first count the Cartesian product and then the where filter.

In hive,this query computes the full cartesianproduct before applying the WHERE

Clause. Itcould take a very long time to finish. When the property Hive.mapred.modeis

Set to strict, the Hive prevents users from inadvertentlyissuing a Cartesian product query.

Two table joins, assuming one is smaller. It is possible to raise the speed by Map-side-join method.

Just such optimizations do not support Right-join and full-join.

Hive does not support theoptimization for Right-and Full-outer joins.

Optimization: Meet certain conditions, set the corresponding number of parameters to open.

The ORDER by clause are familiar from other SQL dialects. It performs a totalorderingof

The query result set. This means, "All" the data is passed through a and a single reducer,

Which a unacceptablylong time to execute for larger data sets.

Order BY is sorted globally, and all data is sorted by a reducer.

Because ORDER by can result in excessively long run times, Hive Willrequire a LIMIT

Clause with ORDER by if the property hive.mapred.mode are set to strict. Bydefault, it is

Set to Nonstrict. (precautions)

Sort by IS partial

Distributeby controls how map output isdivided among reducers.

Distributeby can specify how the map output is assigned to each reducer.

Usually a field is also placed in the same reducer, a bit like groupby thought.

Often used with sort by (Distributeby in front). The effect of first grouping and then internal sorting is achieved.

Cluster by is equal to the effect of distribute by plus sort by.

Additional Information:

Http://www.cnblogs.com/ggjucheng/archive/2013/01/03/2843243.html

Cast (ValueAs TYPE)

When forcing conversions, pay attention to the converted values. See if you have the results you need.

Queries from Sample Data

Random Data extraction

Part did not understand. First, whatever.

Blocksampling

Input pruningfor Bucket Tables

Union all merges two tables



This article linger

This article link: http://blog.csdn.net/lingerlanlan/article/details/41153799


Copyright notice: This article blog original articles, blogs, without consent, may not be reproduced.

"Programming Hive" Reading notes (two) Hive basics

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.