Decrypting data partitions

Source: Internet
Author: User


Data Partitioning is divided into two types: dynamic partitioning and static partitioning, so how are two partitions created? How do they use each other?

First, dynamic partitioning

1. dynamic creation of new partitions from existing data

650) this.width=650; "Src=" https://s5.51cto.com/wyfs02/M02/8C/CB/wKiom1h4PGfz5hqWAACueiqgcN0527.png-wh_500x0-wm_ 3-wmp_4-s_1593960584.png "title=" 11.png "alt=" Wkiom1h4pgfz5hqwaacueiqgcn0527.png-wh_50 "/>

2 , partitions are created automatically based on the last column value, and if the partition does not exist, it is created;

If the partition exists, it will be overwritten.

Second, static partition

1. Static Partitioning Example: Partition call logs by day

Loudacre the customer service phone system generates a detailed call log , the analyst uses this data to summarize the previous day's call volume, such as:

650) this.width=650; "Src=" https://s4.51cto.com/wyfs02/M01/8C/C7/wKioL1h4PHTTB4xfAACFprCGq_w695.png-wh_500x0-wm_ 3-wmp_4-s_751991061.png "title=" 22.png "alt=" Wkiol1h4phttb4xfaacfprcgq_w695.png-wh_50 "/>

Logs are generated on a daily basis, such as:

650) this.width=650; "Src=" https://s1.51cto.com/wyfs02/M02/8C/CB/wKiom1h4PIKRkKQ4AAAxQ_n8RQg608.png-wh_500x0-wm_ 3-wmp_4-s_1035293617.png "title=" 33.png "alt=" Wkiom1h4pikrkkq4aaaxq_n8rqg608.png-wh_50 "/>

in the example above, the data is automatically partitioned based on the column values . Now we're using static partitioning,

because the data file does not contain partition data , the partition table is defined in the same way:

650) this.width=650; "Src=" https://s5.51cto.com/wyfs02/M02/8C/C7/wKioL1h4PJLD68nSAADocrxbS28851.png-wh_500x0-wm_ 3-wmp_4-s_2469641738.png "title=" 44.png "alt=" Wkiol1h4pjld68nsaadocrxbs28851.png-wh_50 "/>

2. loading data to a static partition

with static partitioning, you can create new partitions as needed , for example: Add a partition for the daily call log data:

650) this.width=650; "Src=" https://s2.51cto.com/wyfs02/M00/8C/CB/wKiom1h4PJ-T48fwAABQ2Gsqi9A989.png-wh_500x0-wm_ 3-wmp_4-s_828975661.png "title=" 55.png "alt=" Wkiom1h4pj-t48fwaabq2gsqi9a989.png-wh_50 "/>

This command adds metadata that is partitioned to the table , and create a subdirectory :
/user/hive/warehouse/call_logs/call_date=2014-10-02

Then load the day's data to the correct partition

650) this.width=650; "Src=" https://s5.51cto.com/wyfs02/M00/8C/C7/wKioL1h4PMqTTkMBAACDH5rGWOE743.png-wh_500x0-wm_ 3-wmp_4-s_598477560.png "title=" 66.png "alt=" Wkiol1h4pmqttkmbaacdh5rgwoe743.png-wh_50 "/>

This command moves HDFS file Call-20141002.log to the partition subdirectory

3. overwrite all data for a partition

650) this.width=650; "Src=" https://s3.51cto.com/wyfs02/M00/8C/C7/wKioL1h4PN3TSmBEAAATkNk9m9k813.png-wh_500x0-wm_ 3-wmp_4-s_98867447.png "title=" 77.png "alt=" Wkiol1h4pn3tsmbeaaatknk9m9k813.png-wh_50 "/>

The above is the introduction of dynamic partitioning and static partitioning, then Impala and the Hive , how is the data partitioned? Follow-up will continue to share. However, technology is a threshold, we have to learn more in real life and exchange, and constantly learn from others good experience and knowledge, improve their knowledge structure. And today big data is still in development, all aspects are not very mature, more need to continue to pursue, can not be outdated, here is recommended a public number "big Data cn", also good, have time to pay attention to.


This article is from the "11872756" blog, please be sure to keep this source http://11882756.blog.51cto.com/11872756/1891680

Decrypting data partitions

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.