Data Partitioning is divided into two types: dynamic partitioning and static partitioning, so how are two partitions created? How do they use each other?
First, dynamic partitioning
1. dynamic creation of new partitions from existing data
650) this.width=650; "Src=" https://s5.51cto.com/wyfs02/M02/8C/CB/wKiom1h4PGfz5hqWAACueiqgcN0527.png-wh_500x0-wm_ 3-wmp_4-s_1593960584.png "title=" 11.png "alt=" Wkiom1h4pgfz5hqwaacueiqgcn0527.png-wh_50 "/>
2 , partitions are created automatically based on the last column value, and if the partition does not exist, it is created;
If the partition exists, it will be overwritten.
Second, static partition
1. Static Partitioning Example: Partition call logs by day
Loudacre the customer service phone system generates a detailed call log , the analyst uses this data to summarize the previous day's call volume, such as:
650) this.width=650; "Src=" https://s4.51cto.com/wyfs02/M01/8C/C7/wKioL1h4PHTTB4xfAACFprCGq_w695.png-wh_500x0-wm_ 3-wmp_4-s_751991061.png "title=" 22.png "alt=" Wkiol1h4phttb4xfaacfprcgq_w695.png-wh_50 "/>
Logs are generated on a daily basis, such as:
650) this.width=650; "Src=" https://s1.51cto.com/wyfs02/M02/8C/CB/wKiom1h4PIKRkKQ4AAAxQ_n8RQg608.png-wh_500x0-wm_ 3-wmp_4-s_1035293617.png "title=" 33.png "alt=" Wkiom1h4pikrkkq4aaaxq_n8rqg608.png-wh_50 "/>
in the example above, the data is automatically partitioned based on the column values . Now we're using static partitioning,
because the data file does not contain partition data , the partition table is defined in the same way:
650) this.width=650; "Src=" https://s5.51cto.com/wyfs02/M02/8C/C7/wKioL1h4PJLD68nSAADocrxbS28851.png-wh_500x0-wm_ 3-wmp_4-s_2469641738.png "title=" 44.png "alt=" Wkiol1h4pjld68nsaadocrxbs28851.png-wh_50 "/>
2. loading data to a static partition
with static partitioning, you can create new partitions as needed , for example: Add a partition for the daily call log data:
650) this.width=650; "Src=" https://s2.51cto.com/wyfs02/M00/8C/CB/wKiom1h4PJ-T48fwAABQ2Gsqi9A989.png-wh_500x0-wm_ 3-wmp_4-s_828975661.png "title=" 55.png "alt=" Wkiom1h4pj-t48fwaabq2gsqi9a989.png-wh_50 "/>
This command adds metadata that is partitioned to the table , and create a subdirectory :
/user/hive/warehouse/call_logs/call_date=2014-10-02
Then load the day's data to the correct partition
650) this.width=650; "Src=" https://s5.51cto.com/wyfs02/M00/8C/C7/wKioL1h4PMqTTkMBAACDH5rGWOE743.png-wh_500x0-wm_ 3-wmp_4-s_598477560.png "title=" 66.png "alt=" Wkiol1h4pmqttkmbaacdh5rgwoe743.png-wh_50 "/>
This command moves HDFS file Call-20141002.log to the partition subdirectory
3. overwrite all data for a partition
650) this.width=650; "Src=" https://s3.51cto.com/wyfs02/M00/8C/C7/wKioL1h4PN3TSmBEAAATkNk9m9k813.png-wh_500x0-wm_ 3-wmp_4-s_98867447.png "title=" 77.png "alt=" Wkiol1h4pn3tsmbeaaatknk9m9k813.png-wh_50 "/>
The above is the introduction of dynamic partitioning and static partitioning, then Impala and the Hive , how is the data partitioned? Follow-up will continue to share. However, technology is a threshold, we have to learn more in real life and exchange, and constantly learn from others good experience and knowledge, improve their knowledge structure. And today big data is still in development, all aspects are not very mature, more need to continue to pursue, can not be outdated, here is recommended a public number "big Data cn", also good, have time to pay attention to.
This article is from the "11872756" blog, please be sure to keep this source http://11882756.blog.51cto.com/11872756/1891680
Decrypting data partitions