Use and extension of hash partition table of Oracle partition table _oracle

Source: Internet
Author: User

The hash partition is used to determine the partition attribution of the data by using the hash algorithm on the partitioning key. What are the advantages of using a hash partition?

The advantages of commonly used partitioned tables are: such as increasing the available rows of data, reducing the burden of management, and improving the performance of statements, the hash partition is also owned. In addition, because the hash partition table is based on the hash of the partition key to determine its partition, and the specific partition key its hash value is fixed, that is, the hash partition table data is based on the partition key values to gather, the same partition key is certainly in the same partition.
For example, in the securities industry, we often query a certain stock of the K-line,
Suppose the table is structured as follows:

Copy Code code as follows:

CREATE TABLE Equity
(
ID number,
Trade_date date,
......);

Equity tables can be large, and queries against equity tables are usually assigned IDs, querying for other information during a transaction period or a period of time. In this case, how do we choose partitions for the equity table?
In terms of the structure of the table itself, it seems that the Trade_date column is suitable for selection as a range partition. But if we partition this way, the query in front of the request: Specify an ID, query its range of transaction information, such as looking at the K-line within 1 years, this query often needs to cross partitions. As we know, cross-partitioned queries on partitioned tables are often not as good at performance, especially if the query is likely to span many partitions.
You might also say that we're going to build an index on the ID, trade_date column, and think about it. At this time the data in the equity table is aggregated by the trade_date value, and the same trade_date values are often in a block of data, so that the query described in the preceding requirements, even through the index access, is often used to read discrete chunks of data, That is, each record needs to read a table block of data.
If a hash partition table is built, the data is aggregated by the hash partitioning key, which is more appropriate for the query described in the requirement, because the same ID records must be in the same partition, and the same ID value records fall in the same block of probability also increases, thus "to a certain extent" reduced IO.
This adds quotes to the description of the hash partitioning reduction IO because it is unrealistic to rely only on hash partitioned tables to achieve a large reduction in IO operations, especially when the number of stocks recorded in the equity table is very long, the records of the same stock taking place on different trading days are physically difficult to aggregate into the same data block. In fact, if we use the IoT table for the equity table on the basis of the hash partition, the query performance described above can be greatly improved. The IoT table is not within the scope of the discussion, and there is no further discussion here.
Before we decide to use the hash table, we also need to make sure that our chosen partition key values are contiguous, or close to contiguous partitions, and that the number of partitions needs to be 2 integer powers, such as 2,4,8 ... These requirements are determined by the characteristics of the hash function, so that each partition of our partitioned table contains more data than average.

Expansion of the hash partition table:

The hash partition table is added to the partition through the Add Partition command. The number of Oracle recommended partitions is 2 power, for example, 2,4,8 ... etc., to ensure that the data is evenly distributed across the partitions. Of course, as mentioned earlier, the partitioning key value is still distributed continuously, or near continuous distribution.
When adding new partitions, you need to partition some of the original data from the old partition into the new partition, so what is the principle of the source partition selection when this data is partitioned?
The main points are as follows: if the partition to be increased is the nth partition, and the integer power of the smallest 2 greater than n is M, then when the nth partition is added, the partition's data is derived from the partition N-M/2.
For example, now there is a hash partition table with 100 partitions, we want to add a partition to it, it is 101 partitions, that is, the above formula of N is 101, and the smallest 2 is greater than 101 of the integer power is 128, then M is 128, so the 101 partition of the data source should be 101-128 /2=37 partitions.
On the other hand, when we add the 101th partition, we need to lock the 37 partition, because we need to insert some of the data from that partition into the new 101 partition.
Below, we use an example to verify the above statement, and see what needs to be noted in the actual operation:
The commodity table is a large table in our system, and a few years ago when a hash partition table was created for the table, the then DBA specified 100 partitions when selecting the number of partitions:

Copy Code code as follows:

Select Table_name,partition_position,partition_name,num_rows from user_tab_partitions where table_name=\ ' COMMODITY\ ' ORDER by Partition_position;
TABLE_NAME Partition_position partition_name Num_rows
-------------- ------------------ ---------------------- ----------
Commodity 1 Cot_ind01_p1 4405650
Commodity 2 Cot_ind01_p2 5046650
Commodity 3 Cot_ind01_p3 5107550
......
Commodity cot_ind01_p36 5718800
Commodity Notoginseng cot_ind01_p37 9905200
Commodity cot_ind01_p38 10118400
Commodity cot_ind01_p39 10404950
Commodity COT_IND01_P40 9730850
Commodity cot_ind01_p41 9457300
Commodity cot_ind01_p42 9717950
Commodity cot_ind01_p43 9643900
Commodity COT_IND01_P44 11138000
Commodity Cot_ind01_p45 9381300
Commodity cot_ind01_p46 10101150
Commodity cot_ind01_p47 8809950
Commodity cot_ind01_p48 10611050
Commodity cot_ind01_p49 10010600
Commodity Cot_ind01_p50 8252600
Commodity Wuyi Cot_ind01_p51 9709900
Commodity cot_ind01_p52 8983200
Commodity cot_ind01_p53 9012750
Commodity COT_IND01_P54 9310650
Commodity Cot_ind01_p55 8966450
Commodity cot_ind01_p56 8832650
Commodity cot_ind01_p57 9470600
Commodity cot_ind01_p58 8932450
Commodity cot_ind01_p59 9994850
Commodity cot_ind01_p60 9617450
Commodity cot_ind01_p61 10278850
Commodity cot_ind01_p62 9277600
Commodity cot_ind01_p63 8136300
Commodity Cot_ind01_p64 10064600
Commodity COT_IND01_P65 3710900
......
Commodity Cot_ind01_p99 5273800
Commodity cot_ind01_p100 5293350
Rows selected.

Query the data distribution of each partition, we can see that from the partition 37 ~ 64 of the 28 partitions of the record number of other partitions is probably twice times. Because 100 is not an integer power of 2, Oracle's hash function cannot guarantee that the data is evenly distributed. We add a new partition cot_ind01_p101 for this table:

Copy Code code as follows:

ALTER TABLE NTS_COMMODITY_TS add partition cot_ind01_p101;
Table altered.
elapsed:00:06:58.52

Query the number of new partition records after collecting statistics:

Copy Code code as follows:

Select Table_name,partition_position,partition_name,num_rows from user_tab_partitions where table_name=\ ' COMMODITY\ ' and partition_name in (\ ' cot_iot_ind01_p37\ ', \ ' cot_iot_ind01_p101\ ');

TABLE_NAME Partition_position partition_name Num_rows
------------------ ------------------ --------------------- ----------
Commodity Notoginseng cot__ind01_p37 4905200
Commodity cot_ind01_p101 5107550

At this point, we can see that the data in partition 37 is close to lithographic and divided into partitions 37 and 101.
Monitor increased session lock during partition, we found two objects locked in exclusive mode:

Copy Code code as follows:

Sql> SELECT * from V$lock where sid=1239 and type=\ ' tm\ ' and lmode=6 the order by Sid,lmode;
ADDR kaddr SID TY ID1 ID2 lmode REQUEST CTIME block
---------------- ---------------- ---------- -- ---------- ---------- ---------- ---------- ---------- ----------
ffffffff7d764828 ffffffff7d764888 1239 TM 4004126 0 6 0 72 2
ffffffff7d764828 ffffffff7d764888 1239 TM 4004063 0 6 0 72 2
What are the objects of their respective?
Select object_name,subobject_name,object_id from User_objects where object_id in (4004126,4004063)
object_name Subobject_name object_id
--------------------- ------------------------------ ----------
Commodity cot_ind01_p100 4004126
Commodity cot_ind01_p37 4004063


As you can see, partitions 37 and 100 are locked. Locking 37 partitions is a surprise because you want to transfer data from the table. So why do you want to lock the 100th partition, the last one?
My understanding is that: the new partition's location 101 is determined by 100 of the partition number of the original partition table, and if the last partition 100 of the original table is allowed to be DDL, such as the coalesce operation, the newly added 101 partition may not necessarily be allocating data from the original partition 37, The 101 partition itself should be the new 100th partition, causing confusion. Here, you might say, by this understanding, is it possible that other partitions should also be locked? In fact, because the hash partition table does not support drop partition operations, and only supports coalesce operations to achieve similar operations, coalesce can only shrink from the last partition.
What is the practical guidance for increasing the lock information in the hash table partitioning process?
Continuing the discussion in the previous example, since Partition 37 and the last partition 100 are exclusive locks, the partitions cannot be DML during the addition of partitions because the DML operation requires a shared lock on the partition (mode 3). That is, the application that operates these two partitions is affected.
Adding a hash table does not work as quickly as other types of partitioned tables, such as a range partition, because the process of adding partitions here is to have IO operations, to transfer data to a new partition. In fact, this is not the most important, because the hash table is based on the partition key hash function value to determine the partition, the main time to add the partition is spent on the calculation of the hash value. In the above test, add a new partition operation of the consumption time is 6 minutes 58 seconds, from the following 10046 statistics can be seen, of which 6 minutes are spent on CPU operations, believe that the main is the hash operation caused.

[Code]
Overall totals for all recursive statements
Call count CPU Elapsed disk query current rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse 328 0.17 0.27 0 0 148 0
Execute 1520 360.14 396.30 456820 11416202 26357 11565252
Fetch 1767 5.42 21.18 21421 26540 0 2862
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Total 3615 365.73 417.76 478241 11442742 26505 11568114

The section cot_ind01_p37 in this test case has nearly 10 million data, takes nearly 7 minutes, and assumes that the partition data reaches 100 million, which should take more than 1 hours. If our hash partition is based on Oracle's proposed integer power of 2, we increase the partition by adding a new partition one times as many as the original partition, for example the original partition is 128, and the extension needs to be increased by 128 partitions, multiplied by the time required to add the partition. Adding partitions to the hash table would be a scary operation.
in short, a hash partition has its advantages, but there are serious drawbacks, such as the partition extension problem described here. Therefore, at the beginning of the project design, we need to carefully select the number of partitions. But with the increase of data, it is difficult to avoid adding partition to partition table, this kind of operation is very resource-consuming operation, because the problem of lock during operation will affect the operation of some partitions. But if we're afraid of the problems that lie ahead and drag them out of the partition, the more we do, the harder it will be to increase the partition's operation as the amount of data increases.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.