One, Zoning Concepts
partitioning allows multiple parts of a single table to be allocated across the file system according to the specified rules. Different parts of the table are stored as separate tables in different locations. MySQL supports partition starting from 5.1.3.
Partitioning and manual sub-table comparison
Manual sub-table |
Partition |
Multiple data tables |
A data sheet |
Risk of data duplication |
No risk of duplication of data |
Write more than one table |
Write a table |
No Unified constraint Restrictions |
Constraint restrictions imposed |
MySQL supports the Range,list,hash,key partition type, which is most commonly used with range:
- Range – This mode allows data to be divided into different ranges. For example, you can divide a table into several partitions by year.
- Hash (hash) – This mode allows the calculation of the hash key of one or more columns of the table, and finally partitions the data regions of the hash code with different values. For example, you can create a table that partitions the primary key of a table.
- Key (key value)-an extension of the above hash mode, where the hash key is generated by the MySQL system.
- List (pre-defined list) – This mode allows the system to split the data through predefined list values.
- Composite (composite mode) – The combination of the above modes is used
Second, what the partition can do
- Logical Data Segmentation
- Increase the speed of single write and read applications
- Increase the speed of a partitioned range of read queries
- Split data can have multiple different physical file paths
- Save your historical data efficiently
- Constraint checking on a table
- Different master-Slave server partitioning policies, such as master by hash partition, slave by range partition
third, zoning restrictions ( as of 5.1.44 version )
• You can only partition an integer column of a data table, or a data column can be converted to an integer column by a partition function
• The maximum number of partitions cannot exceed 1024
• If a unique index or primary key is included, the partition column must be contained in all unique indexes or primary keys
• Foreign keys are not supported
• Full-text indexing is not supported (fulltext)
- Partitioning by date is very good, because many date functions can be used. But not too many partitioning functions are appropriate for a string
Iv. when to use partitions
• Massive data sheets
• Historical tables are fast queries that can be used in a archive+partition way.
• Data table index is larger than server valid memory
• For large tables, especially if the index is much larger than the server's valid memory, the partitioning efficiency is more efficient when the index is not available.
Five, partition experiment
Experiment One:
Data published using US Bureau of Transportation Statistics (CSV format). Currently, includes 113 million records, 7.5 GB data 5.2 GB index. Time from 1987 to 2007.
The server uses 4GB of memory so that both the data and the index are larger than the memory size. Set to 4GB because the size of the data warehouse is much larger than the size of the possible memory and can be up to several terabytes. For a normal OLTP database, the index is cached in memory and can be retrieved quickly. If the data exceeds the memory size, you need to use a different approach.
Create a table with a primary key, because the table will usually have a primary key. The table's primary key is too large to allow the index to read into memory, which is generally not efficient, meaning that to access the disk frequently, the speed of access depends entirely on your disk and processor. There is a common practice in designing large data warehouses to not use indexes. Therefore, there is also a performance comparison with and without a primary key.
Test method:
Use three kinds of data to MyISAM, InnoDB, Archive.
For each type of breaking, create an unpartitioned table with a primary key (except archive) and two partition tables, one per month, one by year. Partition table partitioning is as follows:
CREATE TABLE By_year (
D DATE
)
PARTITION by RANGE (year (d))
(
PARTITION P1 VALUES less THAN (2001),
PARTITION P2 VALUES less THAN (2002),
PARTITION P3 VALUES less THAN (2003),
PARTITION P4 VALUES less THAN (MAXVALUE)
)
CREATE TABLE By_month (
D DATE
)
PARTITION by RANGE (To_days (d))
(
PARTITION P1 VALUES Less THAN (to_days (' 2001-02-01′)),-january
PARTITION P2 VALUES Less THAN (to_days (' 2001-03-01′)),-february
PARTITION P3 VALUES Less THAN (to_days (' 2001-04-01′)),-march
PARTITION P4 VALUES less THAN (MAXVALUE)
)
Each one is tested on a separate instance of the MySQL server, with only one table per instance. Each type of breaking starts the service, runs the query and logs the results, and then closes the service. The service instance is created through the MySQL sandbox.
The data is loaded in the following cases:
Id |
Introduction |
Whether to partition |
Data |
Size |
Note |
Load Time (*) |
1 |
MyISAM |
None |
113 million |
GB |
With PK |
Notoginseng min |
2 |
MyISAM |
By month |
113 million |
8 GB |
Without PK |
Min |
3 |
MyISAM |
By year |
113 million |
8 GB |
Without PK |
Min |
4 |
InnoDB |
None |
113 million |
+ GB |
With PK |
Min. |
5 |
InnoDB |
By month |
113 million |
Ten GB |
Without PK |
Min |
6 |
InnoDB |
By year |
113 million |
Ten GB |
Without PK |
Min |
7 |
Archive |
None |
113 million |
1.8 GB |
No keys |
Min |
8 |
Archive |
By month |
113 million |
1.8 GB |
No keys |
Min |
9 |
Archive |
By year |
113 million |
1.8 GB |
No keys |
Min |
* On the Dual-xeon server
To compare the effect of partitioning on large and small datasets, 9 additional instances were created, each containing slightly less than 2GB of data.
There are two types of query statements
SELECT COUNT (*)
From table_name
WHERE date_column between Start_date and end_date
- Specifying record queries
SELECT column_list
From table_name
WHERE column1 = x and Column2 = y and Column3 = Z
For the first type of query, create a statement of different date ranges. For each range, create a set of additional queries for the same range of dates. The first query for each date range is a cold query, which means that the first hit, followed by the query in the same range is the warm query, which means at least partially cached. The query statement is on the Forge.
Results:
1 partitioned table with primary key
The first Test uses a composite primary key, just as the original data table used. The primary key index file reaches 5.5 GB. As you can see, the partitioning not only improves performance, but also slows down the operation of the primary key. Because if you use a primary key index query, and the index does not read into memory, it behaves poorly. It is helpful to note that partitioning is useful, but it must be used properly.
+--–+ ————— –+ ————— –+ ————— –+
| Status | MyISAM No Partitioning | MyISAM Month Division | MyISAM Year Division |
+--–+ ————— –+ ————— –+ ————— –+
| Cold | 2.6574570285714 | 2.9169642 | 3.0373419714286 |
| Warm | 2.5720722571429 | 3.1249698285714 | 3.1294000571429 |
+--–+ ————— –+ ————— –+ ————— –+
Archive
+--–+ —————-+ ————— –+ ————— –+
| Status | Archive No Partitioning | Archive Month Division | Archive Year Division |
+--–+ —————-+ ————— –+ ————— –+
| Cold | 249.849563 | 1.2436211111111 | 12.632532527778 |
| Warm | 235.814442 | 1.0889786388889 | 12.600520777778 |
+--–+ —————-+ ————— –+ ————— –+
Note that the response time of the archive partition is better than the use of MyISAM.
2 partition table with no primary key
Because if the primary key is larger than the available key buffer, or even all memory, all queries using the primary key will use disk. The new method uses only the partition, not the primary key. Performance has been significantly improved.
The 70%-90% performance was improved by the monthly partition table.
+--–+ —————— + —————— + —————— +
| Status | MyISAM No Partitioning | MyISAM Month Division | MyISAM Year Division |
+--–+ —————— + —————— + —————— +
| Cold | 2.6864490285714 | 0.64206445714286 | 2.6343286285714 |
| Warm | 2.8157905714286 | 0.18774977142857 | 2.2084743714286 |
+--–+ —————— + —————— + —————— +
To make the distinction more obvious, I used two large-scale queries that could take advantage of partition-cancellation capabilities.
# Query-Yearly statistics
SELECT year (flightdate) as Y, COUNT (*)
From FlightStats
WHERE flightdate between "2001-01-01″and" 2003-12-31″
GROUP by Y
# Query-Monthly statistics
Select Date_format (flightdate, "%y-%m") as M, COUNT (*)
From FlightStats
WHERE flightdate between "2001-01-01″and" 2003-12-31″
GROUP by M
The results show that the monthly partition table has 30%-60% and the 15%-30% performance is improved by the year partition table.
+ ———-+ ——— –+ ——— –+ ——— –+
| query_id | No points | Monthly Points | Years of |
+ ———-+ ——— –+ ——— –+ ——— –+
| 1 | 97.779958 | 36.296519 | 82.327554 |
| 2 | 69.61055 | 47.644986 | 47.60223 |
+ ———-+ ——— –+ ——— –+ ——— –+
Processor factors
When the above tests are tested on the home machine (Intel Dual Core 2.3 MHz CPU). For the original dual Xeon 2.66 MHz, discover new servers faster!.
Repeating the above test is surprising:
+--–+ ——————-+ ————-+ ————— –+
| state | MyISAM non-partitioned |myisam month Partition | MyISAM Year Division |
+--–+ ——————-+ ————-+ ————— –+
| Cold | 0.051063428571429 | 0.6577062 | 1.6663527428571 |
| Warm | 0.063645485714286 | 0.1093724 | 1.2369152285714 |
+--–+ ——————-+ ————-+ ————— –+
MyISAM tables with primary keys are faster than partitioned tables. The partition table behaves the same as it did, but the performance of the unpartitioned table is improved, making partitioning unnecessary. Now that the server seems to take full advantage of the index, I have indexed the partitioning column on the partitioned table.
# Original Table
CREATE TABLE FlightStats (
Airlineid int NOT NULL,
Uniquecarrier char (3) NOT NULL,
Carrier char (3) NOT NULL,
Flightdate date NOT NULL,
Flightnum char (5) NOT NULL,
Tailnum Char (8) NOT NULL,
Arrdelay double NOT NULL,
Arrtime datetime NOT NULL,
Depdelay double NOT NULL,
Deptime datetime NOT NULL,
Origin char (3) NOT NULL,
Dest char (3) NOT NULL,
Distance int NOT NULL,
Cancelled char (1) Default ' n ',
Primary KEY (Flightdate, Airlineid, Carrier, Uniquecarrier, Flightnum, Origin, Deptime, Dest)
)
# partition Table
CREATE TABLE FlightStats (
Airlineid int NOT NULL,
Uniquecarrier char (3) NOT NULL,
Carrier char (3) NOT NULL,
Flightdate date NOT NULL,
Flightnum char (5) NOT NULL,
Tailnum Char (8) NOT NULL,
Arrdelay double NOT NULL,
Arrtime datetime NOT NULL,
Depdelay double NOT NULL,
Deptime datetime NOT NULL,
Origin char (3) NOT NULL,
Dest char (3) NOT NULL,
Distance int NOT NULL,
Cancelled char (1) Default ' n ',
KEY (Flightdate)
)
PARTITION by RANGE ...
The result is satisfying, getting 35% performance improvements.
+--–+ ——————-+ ——————-+ ——————-+
| state | MyISAM non-partitioned |myisam month Partition | MyISAM Year Division |
+--–+ ——————-+ ——————-+ ——————-+
| Cold | 0.075289714285714 | 0.025491685714286 | 0.072398542857143 |
| Warm | 0.064401257142857 | 0.031563085714286 | 0.056638085714286 |
+--–+ ——————-+ ——————-+ ——————-+
Conclusion:
1. Using table partitioning is not a guarantee of performance improvements. It relies on the following factors:
- The partition uses the column used for partitioning;
- Partition function If the original field is not of type int;
- Server speed;
- The amount of memory.
2. run benchmarks and performance tests before applying to production systems
Depending on the purpose of your database, you may get a huge performance boost or nothing. If you are not careful, you may even degrade performance.
For example, a table that uses a month partition can get excellent speed when queries are always performed on a date range. However, if there is no date query, then a full table scan will be performed.
Partitioning is a key tool for improving the performance of massive data. What is a huge amount of data depends on the hardware being deployed. The blind use of partitioning does not guarantee improved performance, but can be a perfect solution with the help of pre-benchmark and performance testing.
3. the Archive table can be a good compromise
Archive table partitioning can be a huge performance boost. Of course, it depends on your purpose, and any queries are full-table scans when there are no partitions. If you have historical data that you do not need to change, and you have to analyze statistics by time, the use of archive is an excellent choice. It uses 10-20% 's original space and has better performance than the MYISAM/INNODB table for clustered queries.
While a well-optimized partition MyISAM table performance may be better than the corresponding archive table, it takes 10 times times more space.
Experiment Two:
1. Build two tables, one by Time field partition, one non-partitioned.
CREATE TABLE Part_tab
(
C1 int default NULL,
C2 varchar (+) default NULL,
C3 Date Default NULL
) Engine=myisam
PARTITION by RANGE (year (C3)) (PARTITION p0 VALUES less THAN (1995),
PARTITION P1 values less THAN (1996), PARTITION P2 values less THAN (1997),
PARTITION P3 Values less THAN (1998), PARTITION P4 values less THAN (1999),
PARTITION P5 values less THAN (+), PARTITION P6 values less THAN (2001),
PARTITION P7 values less THAN (2002), PARTITION P8 values less THAN (2003),
PARTITION p9 Values less THAN (2004), PARTITION P10 values less THAN (2010),
PARTITION p11 VALUES less THAN MAXVALUE);
CREATE TABLE No_part_tab
(C1 int (one) default NULL,
C2 varchar (+) default NULL,
C3 date Default NULL) Engine=myisam;
2. Build a stored procedure that uses this process to insert 8 million different data into two tables.
Delimiter//
CREATE PROCEDURE Load_part_tab ()
Begin
Declare v int default 0;
While v < 8000000
Do
INSERT INTO Part_tab
Values (V, ' testing partitions ', adddate (' 1995-01-01′, (rand (v) *36520) mod 3652));
Set V = v + 1;
End while;
End
//
And then execute
Mysql> delimiter;
Mysql> call Load_part_tab ();
Query OK, 1 row affected (8 min 17.75 sec)
mysql> INSERT INTO No_part_tab select * from Part_tab;
Query OK, 8000000 rows Affected (51.59 sec)
records:8000000 duplicates:0 warnings:0
3. Start with a simple range query for the data in these two tables. and shows the execution process parsing:
Mysql> Select COUNT (*) from No_part_tab where C3 > Date ' 1995-01-01′and c3 < date ' 1995-12-31′;
+ ———-+
| COUNT (*) |
+ ———-+
| 795181 |
+ ———-+
1 row in Set (38.30 sec)
Mysql> Select COUNT (*) from Part_tab where C3 > Date ' 1995-01-01′and c3 < date ' 1995-12-31′;
+ ———-+
| COUNT (*) |
+ ———-+
| 795181 |
+ ———-+
1 row in Set (3.88 sec)
Mysql> Explain select COUNT (*) from No_part_tab where C3 > Date ' 1995-01-01′and c3 < date ' 1995-12-31′\g
1. Row ***************************
Id:1
Select_type:simple
Table:no_part_tab
Type:all
Possible_keys:null
Key:null
Key_len:null
Ref:null
rows:8000000
Extra:using where
1 row in Set (0.00 sec)
Mysql> Explain partitions select COUNT (*) from Part_tab where
-C3 > Date ' 1995-01-01′and c3 < date ' 1995-12-31′\g
1. Row ***************************
Id:1
Select_type:simple
Table:part_tab
Partitions:p1
Type:all
Possible_keys:null
Key:null
Key_len:null
Ref:null
rows:798458
Extra:using where
1 row in Set (0.00 sec)
As can be seen from the above results, using table partitioning reduces the response time by 90% compared to non-partitioning. The command resolution explain program shows that only the first partition is scanned during a query to a partitioned table, and the rest is skipped. Further testing:
– Increase date Range
Mysql> Select COUNT (*) from No_part_tab where C3 > Date ' -01-01′and c3 < date ' 1997-12-31′;
+ ———-+
| COUNT (*) |
+ ———-+
| 2396524 |
+ ———-+
1 row in Set (5.42 sec)
Mysql> Select COUNT (*) from Part_tab where C3 > Date ' -01-01′and c3 < date ' 1997-12-31′;
+ ———-+
| COUNT (*) |
+ ———-+
| 2396524 |
+ ———-+
1 row in Set (2.63 sec)
– Add non-indexed field queries
Mysql> Select COUNT (*) from Part_tab where C3 > Date ' -01-01′and c3 < date
' 1996-12-31′and c2= ' hello ';
+ ———-+
| COUNT (*) |
+ ———-+
| 0 |
+ ———-+
1 row in Set (0.75 sec)
Mysql> Select COUNT (*) from No_part_tab where C3 > Date ' -01-01′and c3 < Da
Te ' 1996-12-31′and c2= ' hello ';
+ ———-+
| COUNT (*) |
+ ———-+
| 0 |
+ ———-+
1 row in Set (11.52 sec)
Conclusion :
- Partition and unpartitioned occupy approximately the same file space (data and index files)
- If there are no indexed fields in the query statement, the partitioning time is much better than the unpartitioned time
- If the fields in the query statement are indexed, the difference between the partition and the unpartitioned is reduced, and the partition is slightly better than the unpartitioned.
- For large data volumes, it is recommended that you use the partitioning feature.
- Remove unnecessary fields
- According to the manual, adding myisam_max_sort_file_size will increase partition performance
[go] MySQL partition performance preliminary study