[Turn] MySQL partition performance research

Last Update:2015-03-07 Source: Internet

Author: User

Tags ranges

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

One, Zoning Concepts

partitioning allows multiple parts of a single table to be allocated across the file system according to the specified rules. Different parts of the table are stored as separate tables in different locations. MySQL supports partition starting from 5.1.3.

Partitioning and manual sub-table comparison

Manual sub-table	Partition
Multiple data tables	A data sheet
Risk of data duplication	No risk of duplication of data
Write more than one table	Write a table
No Unified constraint Restrictions	Constraint restrictions imposed

MySQL supports the Range,list,hash,key partition type, which is most commonly used with range:

Range – This mode allows data to be divided into different ranges. For example, you can divide a table into several partitions by year.
Hash (hash) – This mode allows the calculation of the hash key of one or more columns of the table, and finally partitions the data regions of the hash code with different values. For example, you can create a table that partitions the primary key of a table.
Key (key value)-an extension of the above hash mode, where the hash key is generated by the MySQL system.
List (pre-defined list) – This mode allows the system to split the data through predefined list values.
Composite (composite mode) – The combination of the above modes is used

Second, what the partition can do

Logical Data Segmentation
Increase the speed of single write and read applications
Increase the speed of a partitioned range of read queries
Split data can have multiple different physical file paths
Save your historical data efficiently
Constraint checking on a table
Different master-Slave server partitioning policies, such as master by hash partition, slave by range partition

third, zoning restrictions ( as of 5.1.44 version )

• You can only partition an integer column of a data table, or a data column can be converted to an integer column by a partition function

• The maximum number of partitions cannot exceed 1024

• If a unique index or primary key is included, the partition column must be contained in all unique indexes or primary keys

• Foreign keys are not supported

• Full-text indexing is not supported (fulltext)

Partitioning by date is very good, because many date functions can be used. But not too many partitioning functions are appropriate for a string

Iv. when to use partitions

• Massive data sheets

• Historical tables are fast queries that can be used in a archive+partition way.

• Data table index is larger than server valid memory

• For large tables, especially if the index is much larger than the server's valid memory, the partitioning efficiency is more efficient when the index is not available.

Five, partition experiment

Experiment One:

Data published using US Bureau of Transportation Statistics (CSV format). Currently, includes 113 million records, 7.5 GB data 5.2 GB index. Time from 1987 to 2007.

The server uses 4GB of memory so that both the data and the index are larger than the memory size. Set to 4GB because the size of the data warehouse is much larger than the size of the possible memory and can be up to several terabytes. For a normal OLTP database, the index is cached in memory and can be retrieved quickly. If the data exceeds the memory size, you need to use a different approach.

Create a table with a primary key, because the table will usually have a primary key. The table's primary key is too large to allow the index to read into memory, which is generally not efficient, meaning that to access the disk frequently, the speed of access depends entirely on your disk and processor. There is a common practice in designing large data warehouses to not use indexes. Therefore, there is also a performance comparison with and without a primary key.

Test method:

Use three kinds of data to MyISAM, InnoDB, Archive.
For each type of breaking, create an unpartitioned table with a primary key (except archive) and two partition tables, one per month, one by year. Partition table partitioning is as follows:

CREATE TABLE By_year (

D DATE

)

PARTITION by RANGE (year (d))

(

PARTITION P1 VALUES less THAN (2001),

PARTITION P2 VALUES less THAN (2002),

PARTITION P3 VALUES less THAN (2003),

PARTITION P4 VALUES less THAN (MAXVALUE)

)

CREATE TABLE By_month (

D DATE

)

PARTITION by RANGE (To_days (d))

(

PARTITION P1 VALUES Less THAN (to_days (' 2001-02-01′)),-january

PARTITION P2 VALUES Less THAN (to_days (' 2001-03-01′)),-february

PARTITION P3 VALUES Less THAN (to_days (' 2001-04-01′)),-march

PARTITION P4 VALUES less THAN (MAXVALUE)

)

Each one is tested on a separate instance of the MySQL server, with only one table per instance. Each type of breaking starts the service, runs the query and logs the results, and then closes the service. The service instance is created through the MySQL sandbox.

The data is loaded in the following cases:

Id	Introduction	Whether to partition	Data	Size	Note	*Load Time ()**
1	MyISAM	None	113 million	GB	With PK	Notoginseng min
2	MyISAM	By month	113 million	8 GB	Without PK	Min
3	MyISAM	By year	113 million	8 GB	Without PK	Min
4	InnoDB	None	113 million	+ GB	With PK	Min.
5	InnoDB	By month	113 million	Ten GB	Without PK	Min
6	InnoDB	By year	113 million	Ten GB	Without PK	Min
7	Archive	None	113 million	1.8 GB	No keys	Min
8	Archive	By month	113 million	1.8 GB	No keys	Min
9	Archive	By year	113 million	1.8 GB	No keys	Min

* On the Dual-xeon server

To compare the effect of partitioning on large and small datasets, 9 additional instances were created, each containing slightly less than 2GB of data.

There are two types of query statements

Aggregate query

SELECT COUNT (*)

From table_name

WHERE date_column between Start_date and end_date

Specifying record queries

SELECT column_list

From table_name

WHERE column1 = x and Column2 = y and Column3 = Z

For the first type of query, create a statement of different date ranges. For each range, create a set of additional queries for the same range of dates. The first query for each date range is a cold query, which means that the first hit, followed by the query in the same range is the warm query, which means at least partially cached. The query statement is on the Forge.

Results:

1 partitioned table with primary key

The first Test uses a composite primary key, just as the original data table used. The primary key index file reaches 5.5 GB. As you can see, the partitioning not only improves performance, but also slows down the operation of the primary key. Because if you use a primary key index query, and the index does not read into memory, it behaves poorly. It is helpful to note that partitioning is useful, but it must be used properly.

+--–+ ————— –+ ————— –+ ————— –+

+--–+ ————— –+ ————— –+ ————— –+

| Cold | 2.6574570285714 | 2.9169642 | 3.0373419714286 |

| Warm | 2.5720722571429 | 3.1249698285714 | 3.1294000571429 |

+--–+ ————— –+ ————— –+ ————— –+

Archive

+--–+ —————-+ ————— –+ ————— –+

+--–+ —————-+ ————— –+ ————— –+

| Cold | 249.849563 | 1.2436211111111 | 12.632532527778 |

| Warm | 235.814442 | 1.0889786388889 | 12.600520777778 |

+--–+ —————-+ ————— –+ ————— –+

Note that the response time of the archive partition is better than the use of MyISAM.

2 partition table with no primary key

Because if the primary key is larger than the available key buffer, or even all memory, all queries using the primary key will use disk. The new method uses only the partition, not the primary key. Performance has been significantly improved.

The 70%-90% performance was improved by the monthly partition table.

+--–+ —————— + —————— + —————— +

+--–+ —————— + —————— + —————— +

| Cold | 2.6864490285714 | 0.64206445714286 | 2.6343286285714 |

| Warm | 2.8157905714286 | 0.18774977142857 | 2.2084743714286 |

+--–+ —————— + —————— + —————— +

To make the distinction more obvious, I used two large-scale queries that could take advantage of partition-cancellation capabilities.

# Query-Yearly statistics

SELECT year (flightdate) as Y, COUNT (*)

From FlightStats

WHERE flightdate between "2001-01-01″and" 2003-12-31″

GROUP by Y

# Query-Monthly statistics

Select Date_format (flightdate, "%y-%m") as M, COUNT (*)

From FlightStats

WHERE flightdate between "2001-01-01″and" 2003-12-31″

GROUP by M

The results show that the monthly partition table has 30%-60% and the 15%-30% performance is improved by the year partition table.

+ ———-+ ——— –+ ——— –+ ——— –+

+ ———-+ ——— –+ ——— –+ ——— –+

| 1 | 97.779958 | 36.296519 | 82.327554 |

| 2 | 69.61055 | 47.644986 | 47.60223 |

+ ———-+ ——— –+ ——— –+ ——— –+

Processor factors

When the above tests are tested on the home machine (Intel Dual Core 2.3 MHz CPU). For the original dual Xeon 2.66 MHz, discover new servers faster!.

Repeating the above test is surprising:

+--–+ ——————-+ ————-+ ————— –+

+--–+ ——————-+ ————-+ ————— –+

| Cold | 0.051063428571429 | 0.6577062 | 1.6663527428571 |

| Warm | 0.063645485714286 | 0.1093724 | 1.2369152285714 |

+--–+ ——————-+ ————-+ ————— –+

MyISAM tables with primary keys are faster than partitioned tables. The partition table behaves the same as it did, but the performance of the unpartitioned table is improved, making partitioning unnecessary. Now that the server seems to take full advantage of the index, I have indexed the partitioning column on the partitioned table.

# Original Table

CREATE TABLE FlightStats (

Airlineid int NOT NULL,

Uniquecarrier char (3) NOT NULL,

Carrier char (3) NOT NULL,

Flightdate date NOT NULL,

Flightnum char (5) NOT NULL,

Tailnum Char (8) NOT NULL,

Arrdelay double NOT NULL,

Arrtime datetime NOT NULL,

Depdelay double NOT NULL,

Deptime datetime NOT NULL,

Origin char (3) NOT NULL,

Dest char (3) NOT NULL,

Distance int NOT NULL,

Cancelled char (1) Default ' n ',

Primary KEY (Flightdate, Airlineid, Carrier, Uniquecarrier, Flightnum, Origin, Deptime, Dest)

)

# partition Table

CREATE TABLE FlightStats (

Airlineid int NOT NULL,

Uniquecarrier char (3) NOT NULL,

Carrier char (3) NOT NULL,

Flightdate date NOT NULL,

Flightnum char (5) NOT NULL,

Tailnum Char (8) NOT NULL,

Arrdelay double NOT NULL,

Arrtime datetime NOT NULL,

Depdelay double NOT NULL,

Deptime datetime NOT NULL,

Origin char (3) NOT NULL,

Dest char (3) NOT NULL,

Distance int NOT NULL,

Cancelled char (1) Default ' n ',

KEY (Flightdate)

)

PARTITION by RANGE ...

The result is satisfying, getting 35% performance improvements.

+--–+ ——————-+ ——————-+ ——————-+

+--–+ ——————-+ ——————-+ ——————-+

| Cold | 0.075289714285714 | 0.025491685714286 | 0.072398542857143 |

| Warm | 0.064401257142857 | 0.031563085714286 | 0.056638085714286 |

+--–+ ——————-+ ——————-+ ——————-+

Conclusion:

1. Using table partitioning is not a guarantee of performance improvements. It relies on the following factors:

The partition uses the column used for partitioning;
Partition function If the original field is not of type int;
Server speed;
The amount of memory.

2. run benchmarks and performance tests before applying to production systems

Depending on the purpose of your database, you may get a huge performance boost or nothing. If you are not careful, you may even degrade performance.

For example, a table that uses a month partition can get excellent speed when queries are always performed on a date range. However, if there is no date query, then a full table scan will be performed.

Partitioning is a key tool for improving the performance of massive data. What is a huge amount of data depends on the hardware being deployed. The blind use of partitioning does not guarantee improved performance, but can be a perfect solution with the help of pre-benchmark and performance testing.

3. the Archive table can be a good compromise

Archive table partitioning can be a huge performance boost. Of course, it depends on your purpose, and any queries are full-table scans when there are no partitions. If you have historical data that you do not need to change, and you have to analyze statistics by time, the use of archive is an excellent choice. It uses 10-20% 's original space and has better performance than the MYISAM/INNODB table for clustered queries.

While a well-optimized partition MyISAM table performance may be better than the corresponding archive table, it takes 10 times times more space.

Experiment Two:

1. Build two tables, one by Time field partition, one non-partitioned.

CREATE TABLE Part_tab

(

C1 int default NULL,

C2 varchar (+) default NULL,

C3 Date Default NULL

) Engine=myisam

PARTITION by RANGE (year (C3)) (PARTITION p0 VALUES less THAN (1995),

PARTITION P1 values less THAN (1996), PARTITION P2 values less THAN (1997),

PARTITION P3 Values less THAN (1998), PARTITION P4 values less THAN (1999),

PARTITION P5 values less THAN (+), PARTITION P6 values less THAN (2001),

PARTITION P7 values less THAN (2002), PARTITION P8 values less THAN (2003),

PARTITION p9 Values less THAN (2004), PARTITION P10 values less THAN (2010),

PARTITION p11 VALUES less THAN MAXVALUE);

CREATE TABLE No_part_tab

(C1 int (one) default NULL,

C2 varchar (+) default NULL,

C3 date Default NULL) Engine=myisam;

2. Build a stored procedure that uses this process to insert 8 million different data into two tables.

Delimiter//

CREATE PROCEDURE Load_part_tab ()

Begin

Declare v int default 0;

While v < 8000000

INSERT INTO Part_tab

Values (V, ' testing partitions ', adddate (' 1995-01-01′, (rand (v) *36520) mod 3652));

Set V = v + 1;

End while;

End

And then execute

Mysql> delimiter;

Mysql> call Load_part_tab ();

Query OK, 1 row affected (8 min 17.75 sec)

mysql> INSERT INTO No_part_tab select * from Part_tab;

Query OK, 8000000 rows Affected (51.59 sec)

records:8000000 duplicates:0 warnings:0

3. Start with a simple range query for the data in these two tables. and shows the execution process parsing:

Mysql> Select COUNT (*) from No_part_tab where C3 > Date ' 1995-01-01′and c3 < date ' 1995-12-31′;

+ ———-+

| COUNT (*) |

+ ———-+

| 795181 |

+ ———-+

1 row in Set (38.30 sec)

Mysql> Select COUNT (*) from Part_tab where C3 > Date ' 1995-01-01′and c3 < date ' 1995-12-31′;

+ ———-+

| COUNT (*) |

+ ———-+

| 795181 |

+ ———-+

1 row in Set (3.88 sec)

Mysql> Explain select COUNT (*) from No_part_tab where C3 > Date ' 1995-01-01′and c3 < date ' 1995-12-31′\g

1. Row ***************************

Id:1

Select_type:simple

Table:no_part_tab

Type:all

Possible_keys:null

Key:null

Key_len:null

Ref:null

rows:8000000

Extra:using where

1 row in Set (0.00 sec)

Mysql> Explain partitions select COUNT (*) from Part_tab where

-C3 > Date ' 1995-01-01′and c3 < date ' 1995-12-31′\g

1. Row ***************************

Id:1

Select_type:simple

Table:part_tab

Partitions:p1

Type:all

Possible_keys:null

Key:null

Key_len:null

Ref:null

rows:798458

Extra:using where

1 row in Set (0.00 sec)

As can be seen from the above results, using table partitioning reduces the response time by 90% compared to non-partitioning. The command resolution explain program shows that only the first partition is scanned during a query to a partitioned table, and the rest is skipped. Further testing:

– Increase date Range

Mysql> Select COUNT (*) from No_part_tab where C3 > Date ' -01-01′and c3 < date ' 1997-12-31′;

+ ———-+

| COUNT (*) |

+ ———-+

| 2396524 |

+ ———-+

1 row in Set (5.42 sec)

Mysql> Select COUNT (*) from Part_tab where C3 > Date ' -01-01′and c3 < date ' 1997-12-31′;

+ ———-+

| COUNT (*) |

+ ———-+

| 2396524 |

+ ———-+

1 row in Set (2.63 sec)

– Add non-indexed field queries

Mysql> Select COUNT (*) from Part_tab where C3 > Date ' -01-01′and c3 < date

' 1996-12-31′and c2= ' hello ';

+ ———-+

| COUNT (*) |

+ ———-+

| 0 |

+ ———-+

1 row in Set (0.75 sec)

Mysql> Select COUNT (*) from No_part_tab where C3 > Date ' -01-01′and c3 < Da

Te ' 1996-12-31′and c2= ' hello ';

+ ———-+

| COUNT (*) |

+ ———-+

| 0 |

+ ———-+

1 row in Set (11.52 sec)

Conclusion :

Partition and unpartitioned occupy approximately the same file space (data and index files)
If there are no indexed fields in the query statement, the partitioning time is much better than the unpartitioned time
If the fields in the query statement are indexed, the difference between the partition and the unpartitioned is reduced, and the partition is slightly better than the unpartitioned.
For large data volumes, it is recommended that you use the partitioning feature.
Remove unnecessary fields
According to the manual, adding myisam_max_sort_file_size will increase partition performance

[go] MySQL partition performance preliminary study

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More