Data partitioning in Impala and hive (1)

Last Update:2017-01-12 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Partitioning the data will greatly improve the efficiency of data query, especially the use of big data in the present, is an indispensable knowledge. So how does the data create partitions? How does the data load into the partition?

impala/hive by state partition accounts

(1) Example: Accounts Non-partitioned table

650) this.width=650; "Src=" https://s4.51cto.com/wyfs02/M02/8C/C2/wKiom1h2_EnSt-X7AAEfrJRChiI954.png-wh_500x0-wm_ 3-wmp_4-s_645522526.png "title=" 11.png "alt=" Wkiom1h2_enst-x7aaefrjrchii954.png-wh_50 "/>

The data is stored in the accounts directory if created by the above method. So, what if most of Loudacre's analysis of the Customer table is done by state? Like what:

650) this.width=650; "Src=" https://s5.51cto.com/wyfs02/M01/8C/C2/wKiom1h2_FugLa6sAABJqWfJJaE435.png-wh_500x0-wm_ 3-wmp_4-s_3489301602.png "title=" 22.png "alt=" Wkiom1h2_fugla6saabjqwfjjae435.png-wh_50 "/>

In this case, if the amount of data is large, in order to avoid the full table scan, we can create the partition. If you do not create a partition, it will default to all queries that have to scan all files in the directory. create partition press State to store the data to a different subdirectory, and when queried according to the "NY" criteria, it will only scan to subdirectories, the following I specifically look at partition creation.

Second, partition creation

(1) using partitioned by to create a partitioned table

650) this.width=650; "Src=" https://s3.51cto.com/wyfs02/M01/8C/BE/wKioL1h2_Grw90jxAAFWAgQZY6E325.png-wh_500x0-wm_ 3-wmp_4-s_392022630.png "title=" 33.png "alt=" Wkiol1h2_grw90jxaafwagqzy6e325.png-wh_50 "/>

Note that the state is deleted because it is a partition field and we know that the partition data will not appear in the actual file, so state will not appear in the column as a partition field. In other words, a partition key is a virtual column, and it is not in the column. So, how do we see the columns of our partition? Will it appear in our structure? That's going to happen.

Third, view the partition column

Use describe to display the partition column, which appears in the last column of the structure, which is a virtual column, not the actual column that exists in the data.

650) this.width=650; "Src=" https://s2.51cto.com/wyfs02/M00/8C/C2/wKiom1h2_HjimK9DAAEw_rxEwws663.png-wh_500x0-wm_ 3-wmp_4-s_1632417991.png "title=" 44.png "alt=" Wkiom1h2_hjimk9daaew_rxewws663.png-wh_50 "/>

We create a single partition, but sometimes there are nested partitions, how do we handle them?

Iv. Creating nested partitions:

650) this.width=650; "Src=" https://s3.51cto.com/wyfs02/M02/8C/BE/wKioL1h2_IXyfiaBAABVUJW1iHA425.png-wh_500x0-wm_ 3-wmp_4-s_3300471324.png "title=" 55.png "alt=" Wkiol1h2_ixyfiabaabvujw1iha425.png-wh_50 "/>

Created partitions, how do we load data into partitions? There are two ways of dynamic partitioning and static partitioning. Dynamic partitioning means that impala/hive automatically adds new partitions as they are loaded, and data is stored in the correct partitions (subdirectories) based on column values. Static partitioning requires that we define the name of the partition in advance by using the Add partition, and when loading the data, specify which partition to store the data to. So what are the characteristics of dynamic partitioning and static partitioning? Follow up for everyone and then share.

for big data, we should actively to cater and learn, because it does not have a mature system, but also in the development of the rise, only continuous learning to improve to catch up with the pace of development. Suggestions in peacetime everyone learn more communication, I usually like to focus on "Big Data cn" This public number, for me personally, very good, recommended onlookers.

This article is from the "11872756" blog, please be sure to keep this source http://11882756.blog.51cto.com/11872756/1891301

Data partitioning in Impala and hive (1)

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Data partitioning in Impala and hive (1)

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Data partitioning in Impala and hive (1)

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support