Use and extension of hash partition tables for oracle partition tables

Source: Internet
Author: User

Hash partitions use the Hash algorithm on the partition key to determine the partition ownership of data. What are the advantages of using Hash partitions?

Common advantages of partition tables: such as increasing available data rows, reducing management burden, and improving statement performance, hash partitions also have the same advantages. In addition, the Hash Partition Table is partitioned Based on the hash calculation result of the partition key. The hash value of a specific partition key is fixed, that is to say, the data in the Hash Partition Table is clustered by the partition key value, and the same partition key must be in the same partition.
For example, in the securities industry, we often query the K-line of a stock,
Assume that the table structure is as follows:

Copy codeThe Code is as follows:
Create table equity
(
Id number,
Trade_date date,
......);

The Equity table may be very large. The equity table is usually queried by a specified id to query other information of a transaction date or period. In this case, how do we select partitions for the equity table?
From the perspective of the table structure, it seems that the trade_date column is suitable for selection as a range partition. However, if we partition the data like this, the query in the preceding requirement: specify an id to query the transaction information within a certain range of the id, such as viewing the keline within one year, this type of query usually requires cross-partition. We know that performing cross-partition queries on partitioned tables is not very good in many cases. In particular, such queries may need to span many partitions.
You may also say that it is not enough to create an index on the id and trade_date columns. do you think about this? At this time, the data in the equity table is aggregated by the trade_date value. Similarly, the data of the trade_date value is often in a data block. In this way, even if the query described in the preceding requirement is accessed through an index, in the final table reading, discrete data blocks are often read, that is, each record needs to read a table data block.
If a Hash Partition Table is created, data is clustered by the hash partition key, which is more suitable for the query described in the requirement. Because records with the same id must be in the same partition, the probability of records with the same id value falling into the same data block is also increased, thus reducing IO to a certain extent.
The description of hash partition IO reduction is enclosed by quotation marks, because it is unrealistic to try to reduce IO in a wide range only by using Hash partition tables, especially when the number of shares recorded in the equity table is large, records of the same stock occurring on different trading days are physically difficult to aggregate into the same data block. In fact, if we use the IOT table organization method for the equity table based on the Hash partition, the query performance described above can be greatly improved. The IOT table is not covered in this article. We will not discuss it further here.
Before we decide to use the Hash table, we also need to determine that the selected partition key values are continuously distributed or close to continuous partitions. In addition, the number of partitions must be an integer power of 2, for example, 2, 4, 8... These requirements are determined by the characteristics of the Hash function, so that the data volume of each partition in the Partition Table is relatively average.

Hash Partition Table extension:

The add partition command is used to add partitions to a Hash partition table. The number of recommended Oracle partitions is the power of 2, for example, 2, 4, 8.... This ensures that data is evenly distributed in each partition. Of course, as mentioned above, partition key values must be continuously distributed or close to continuous distribution.
When adding a new partition, You need to divide some original data from the old partition to the new partition. What are the principles of source partition selection for this data partition?
Key points: If the partition to be added is the nth partition, And the integer power of 2 greater than or equal to N is M, when the nth partition is added, the data for this partition comes from the partition N-M/2.
For example, there is a Hash partition table with a total of 100 partitions. If we want to add a partition for it, it is 101 partitions, that is, N in the above formula is 101, if the integer power of the smallest 2 greater than 101 is 128, M is 128. Therefore, the data source of this 101 partition should be 101-128/2 = 37.
In other words, when we add 101st partitions, we need to lock 37 partitions, because we need to insert part of the data in this partition into the new 101 partition.
Next, we will use an instance to verify the above statement and see what needs to be paid attention to in actual operations:
The Commodity table is a big table in our system. When we created a Hash Partition Table for the table a few years ago, the DBA specified 100 partitions when selecting the number of partitions:

Copy codeThe Code is as follows:
Select TABLE_NAME, PARTITION_POSITION, PARTITION_NAME, NUM_ROWS from user_tab_partitions where table_name = \ 'commodity \ 'order by PARTITION_POSITION;
TABLE_NAME PARTITION_POSITION PARTITION_NAME NUM_ROWS
----------------------------------------------------------------
COMMODITY 1 COT_IND01_P1 4405650
COMMODITY 2 COT_IND01_P2 5046650
COMMODITY 3 COT_IND01_P3 5107550
......
COMMODITY 36 COT_IND01_P36 5718800
COMMODITY 37 COT_IND01_P37 9905200
COMMODITY 38 cot_ind01_spo2 10118400
COMMODITY 39 COT_IND01_P39 10404950
COMMODITY 40 COT_IND01_P40 9730850
COMMODITY 41 COT_IND01_P41 9457300
Commod ity 42 COT_IND01_P42 9717950
COMMODITY 43 COT_IND01_P43 9643900
COMMODITY 44 COT_IND01_P44 11138000
Commodity45 COT_IND01_P45 9381300
COMMODITY 46 COT_IND01_P46 10101150
COMMODITY 47 COT_IND01_P47 8809950
COMMODITY 48 COT_IND01_P48 10611050.
COMMODITY 49 cot_ind01_p4910010600
COMMODITY 50 COT_IND01_P50 8252600
COMMODITY 51 COT_IND01_P51 9709900
COMMODITY 52 COT_IND01_P52 8983200
COMMODITY 53 COT_IND01_P53 9012750
COMMODITY 54 COT_IND01_P54 9310650
COMMODITY 55 COT_IND01_P55 8966450
COMMODITY 56 COT_IND01_P56 8832650
COMMODITY 57 COT_IND01_P57 9470600
COMMODITY 58 COT_IND01_P58 8932450
COMMODITY 59 COT_IND01_P59 9994850
COMMODITY 60 COT_IND01_P60 9617450
COMMODITY 61 COT_IND01_P61 10278850
COMMODITY 62 COT_IND01_P62 9277600
COMMODITY 63 COT_IND01_P63 8136300
COMMODITY 64 cot_ind01_p6410064600
COMMODITY 65 cot_ind01_p0 3710900
......
COMMODITY 99 COT_IND01_P99 5273800
COMMODITY 100 COT_IND01_P100 5293350
100 rows selected.

Query the data distribution of each partition. We can see that from partition 37 ~ The number of records in 28 of 64 is about twice that of other partitions. Since 100 is not an integer power of 2, Oracle's hash function cannot guarantee that data is evenly distributed. We add a new partition COT_IND01_P101 for this table:

Copy codeThe Code is as follows:
Alter table nts_commodity_ts add partition COT_IND01_P101;
Table altered.
Elapsed: 00:06:58. 52

Query the number of new partition records after collecting statistics:

Copy codeThe Code is as follows:
Select TABLE_NAME, PARTITION_POSITION, PARTITION_NAME, NUM_ROWS from user_tab_partitions where table_name = \ 'commodity \ 'and partition_name in (\ 'cot _ indexes \', \ 'cot _ IOT_IND01_P101 \');

TABLE_NAME PARTITION_POSITION PARTITION_NAME NUM_ROWS
-------------------------------------------------------------------
COMMODITY 37 cot1_ind01_p37 4905200
COMMODITY 101 COT_IND01_P101 5107550

At this time, we can see that the data in partition 37 is close to being evenly divided into partition 37 and 101.
When the session lock is added during monitoring, we find that two objects are locked in exclusive mode:

Copy codeThe Code is as follows:
SQL> select * from v $ lock where sid = 1239 and type = \ 'Tm \ 'and LMODE = 6 order by sid, lmode;
Addr kaddr sid ty ID1 ID2 LMODE REQUEST CTIME BLOCK
--------------------------------------------------------------------------------------------------------
FFFFFFFF7D764828 FFFFFFFF7D764888 1239 TM 4004126 0 6 0 72 2
FFFFFFFF7D764828 FFFFFFFF7D764888 1239 TM 4004063 0 6 0 72 2
What objects are they?
Select OBJECT_NAME, SUBOBJECT_NAME, OBJECT_ID from user_objects where object_id in (4004126, 4004063)
OBJECT_NAME SUBOBJECT_NAME OBJECT_ID
-------------------------------------------------------------
COMMODITY COT_IND01_P100 4004126
COMMODITY cot_ind01_p374004063


As you can see, partition 37 and 100 are locked. Locking partition 37 is unexpected because data needs to be transferred from the table. Why should we lock 100th partitions, that is, the last partition?
My understanding is: the position of the newly added partition is 101 determined by the number of partitions in the original Partition Table 100, if you allow DDL operations on the last partition 100 of the original table during the process of adding a partition, such as coalesce, the Newly Added 101 partition does not necessarily allocate data from the original partition 37, 101 partitions should be new 100th partitions, which will cause confusion. Here, you may say that, according to this understanding, should other partitions be locked? Actually, this is not necessary because the hash partition table does not support the drop partition operation. It only supports the coalesce operation to perform similar operations. However, coalesce can only contract from the last partition.
What is the actual guiding significance of the lock information in the process of adding hash table partitions?
As discussed in the previous example, because partition 37 and the last partition 100 will be exclusive locked, the two partitions cannot be used for DML operations during partition addition, because the DML operation requires applying for a shared lock on the partition (mode: 3 ). That is, the applications that operate the two partitions will be affected.
Adding a partition to a Hash table is not as fast as adding a partition to another partition table, such as a range partition, Because I/O operations are required during the partition adding process, and data needs to be transferred to a new partition. In fact, this is not the most important thing. Because the Hash table determines the partition based on the Hash function value of the partition key, the main time for adding a partition is actually spent on calculating the hash value. In the test above, the time consumed for adding a new partition is 6 minutes 58 seconds. From the following 10046 statistics, we can see that 6 minutes are spent on CPU operations, I believe it is mainly caused by Hash operations.

[Code]
OVERALL TOTALS FOR ALL RECURSIVE STATEMENTS
Call count cpu elapsed disk query current rows
-----------------------------------------------------------------------
Parse 328 0.17 0.27 0 0 148 0
Execute 1520 360.14 396.30 456820 11416202 26357 11565252
Fetch 1767 5.42 21.18 21421 26540 0 2862
-----------------------------------------------------------------------
Total 3615 365.73 417.76 478241 11442742 26505 11568114

In this test case, the partition COT_IND01_P37 contains nearly 10 million pieces of data, which takes nearly 7 minutes. If the partition data reaches 0.1 billion pieces, the time should be more than 1 hour. If the Hash Partition Number is an integer power of 2 recommended by Oracle, we will add a new partition that doubles the original Partition Number when adding a partition. For example, if the original Partition Number is 128, during expansion, you need to add 128 partitions, multiplied by the time required to add each partition. Adding partitions to the Hash table is a terrible operation.
In short, Hash partitions have their advantages, but they also have serious defects, such as the partition expansion problem described here. Therefore, at the beginning of the project design, we need to carefully select the number of partitions. However, as the amount of data increases, it is difficult to avoid adding partitions to partition tables. Such operations consume resources, during the operation, the lock may affect the operation on some original partitions. However, if we are afraid of the preceding problems and do not need to expand partitions, the more we move forward, the more difficult it will be to implement this operation with the increase of data volume.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.