A detailed introduction of _php based on MySQL partitioning performance example

Source: Internet
Author: User
Tags hash one table

One, Zoning Concepts  

partitions allow multiple portions of a single table to be allocated across the file system according to the specified rules. Different parts of the table are stored as separate tables in different places. MySQL has been supporting partition from 5.1.3.

Partitioning and manual table comparisons

Manual sub-table   Partition
Multiple data tables A data table
Risk of duplicate data No risk of duplication of data
Write more than one table Write a table
No uniform constraints Enforced constraint restrictions

MySQL supports the Range,list,hash,key partition type, where the range is most commonly used:

    • Range – This pattern allows the data to be divided into different ranges. For example, you can divide a table into several partitions by year.
    • Hash (hash) – This pattern allows partitioning of the hash key of one or more columns of a table, and finally through the data region corresponding to the hash code's different values. For example, you can create a table that partitions the primary key of a table.
    • Key value-an extension of the hash pattern above, where the hash key is generated by the MySQL system.
    • List (predefined list) – This pattern allows the system to segment data by the values of predefined lists.
    • Composite (composite mode) – Combined use of the above patterns

Second, what can zoning do

    • Logical Data Partitioning
    • Increase the speed of single write and read applications
    • Increase the speed of a partitioned range read query
    • Split data can have multiple different physical file paths
    • Efficient preservation of historical data
    • Constraint checking on a table
    • Different master-Slave server partitioning policies, such as master by hash partition, slave by range

third, zoning restrictions ( as of 5.1.44 Edition )

• You can only partition an integral column of a datasheet, or a data column can be converted to an integer column by a partition function

• Maximum number of partitions cannot exceed 1024

• If a unique index or primary key is included, the partitioning column must be contained in all unique indexes or primary keys

• Foreign key not supported

• Full-text indexing is not supported (fulltext)

    • Partitioning by date is ideal because many date functions can be used. But the proper partitioning function for strings is not too much

Four, when to use the partition

• Mass data tables

• History Table Quick Query, you can use Archive+partition way.

• Data table index is greater than server active memory

• For large tables, especially when the index is much larger than the server's available memory, you can use the index without indexing, and partitioning efficiency is more effective at this point.

Five, zoning experiment

Experiment One:

Data published using the US Bureau of Transportation Statistics (CSV format). Currently, includes 113 million records, 7.5 GB data 5.2 GB index. Time from 1987 to 2007.

The server uses 4GB of memory, so that both the data and the index are larger than the memory size. This is set to 4GB because the data warehouse size is much larger than the size of the possible memory and may be several terabytes. For ordinary OLTP databases, index caching is in memory and can be quickly retrieved. If the data exceeds the memory size, you need to use a different approach.

Create a table with a primary key, because the table will usually have a primary key. The table's primary key is too large to cause the index to read into memory, which is generally not efficient, meaning that you have to access the disk frequently, and the speed of access depends entirely on your disk and processor. At present, in the design of a large data warehouse, there is a common practice is not to use the index. So there is also the ability to compare and not have a primary key.

Test method:

Use three kinds of data to guide the MyISAM, InnoDB, Archive.
For each type of introduction, create an unpartitioned table with a primary key (except archive) and two partitioned tables, one per month per year. Partition table partitioning is as follows:

CREATE TABLE By_year (

D DATE

)

PARTITION by RANGE (year (d))

(

PARTITION P1 VALUES less THAN (2001),

PARTITION P2 VALUES less THAN (2002),

PARTITION P3 VALUES less THAN (2003),

PARTITION P4 VALUES less THAN (MAXVALUE)

)

CREATE TABLE By_month (

D DATE

)

PARTITION by RANGE (To_days (d))

(

PARTITION P1 VALUES Less THAN (to_days (' 2001-02-01′)),-january

PARTITION P2 VALUES Less THAN (to_days (' 2001-03-01′)),-february

PARTITION P3 VALUES Less THAN (to_days (' 2001-04-01′)),-march

PARTITION P4 VALUES less THAN (MAXVALUE)

)

Each is tested on a separate instance on the MySQL server, with only one table per instance. Each of these will start the service, run the query and record the results, and then close the service. Service instances are created through MySQL sandbox.

The data is loaded as follows:

Id to lead the brake Whether to partition Data Size Note Load Time (*)
1 MyISAM None 113 million GB With PK Panax min
2 MyISAM By month 113 million 8 GB Without PK Min
3 MyISAM By year 113 million 8 GB Without PK Min
4 InnoDB None 113 million GB With PK Min
5 InnoDB By month 113 million Ten GB Without PK Min
6 InnoDB By year 113 million Ten GB Without PK Min
7 Archive None 113 million 1.8 GB No keys Min
8 Archive By month 113 million 1.8 GB No keys Min
9 Archive By year 113 million 1.8 GB No keys Min

* On the Dual-xeon server

To contrast the effects of partitions on large and small datasets, 9 additional instances were created, each containing slightly less than 2GB of data.

There are two types of query statements

    • Clustered Query

SELECT COUNT (*)

From table_name

WHERE date_column BETWEEN start_date and end_date

    • Specify a record query

SELECT column_list

From table_name

WHERE column1 = x and Column2 = y and Column3 = Z

For the first query, create a different date range statement. For each range, create a set of additional queries for the same range date. The first query for each date range is a cold query that means the first hit, followed by a warm query in the same range, meaning that the least part is cached. The query statement is on the Forge.

Results:

1 partition table with primary key

The first Test uses a composite primary key, just as the original data table uses. Primary key index file reaches 5.5 GB. As you can see, not only does partitioning not improve performance, but primary keys also slow down operations. Because if you use a primary key index query and the index does not read into memory, it behaves poorly. It is helpful to suggest that partitioning is useful, but it must be used properly.

+--–+ ————— –+ ————— –+ ————— –+

| Status |   MyISAM No Partitions |  MyISAM Month Division | MyISAM-Year Division |

+--–+ ————— –+ ————— –+ ————— –+

| Cold |       2.6574570285714 | 2.9169642 | 3.0373419714286 |

| Warm | 2.5720722571429 | 3.1249698285714 | 3.1294000571429 |

+--–+ ————— –+ ————— –+ ————— –+

Archive the breaking of the guide

+--–+ —————-+ ————— –+ ————— –+

| Status |   Archive No Partitions |   Archive Month Division | Archive-Year Division |

+--–+ —————-+ ————— –+ ————— –+

|     Cold | 249.849563 | 1.2436211111111 | 12.632532527778 |

|     Warm | 235.814442 | 1.0889786388889 | 12.600520777778 |

+--–+ —————-+ ————— –+ ————— –+

Note that the response time of archive to the partition is better than that of using MyISAM.

2 partition table with no primary key

Because if the primary key is larger than the available key buffer or even all memory, all queries using the primary key will use the disk. The new way is to use only partitions, not primary keys. There is a significant improvement in performance.

The 70%-90% performance improvement was achieved by the monthly partitioned table.

+--–+ —————— + —————— + —————— +

| Status |   MyISAM No Partitions |  MyISAM Month Division | MyISAM-Year Division |

+--–+ —————— + —————— + —————— +

|  Cold | 2.6864490285714 |  0.64206445714286 | 2.6343286285714 |

|  Warm | 2.8157905714286 |  0.18774977142857 | 2.2084743714286 |

+--–+ —————— + —————— + —————— +

To make the distinction more obvious, I used two large queries to take advantage of partition elimination.

# Query in the annual statistics

SELECT year (flightdate) as Y, COUNT (*)

From FlightStats

WHERE flightdate BETWEEN "2001-01-01″and" 2003-12-31″

GROUP by Y

# Query 2– Monthly statistics

SELECT Date_format (flightdate, "%y-%m") as M, COUNT (*)

From FlightStats

WHERE flightdate BETWEEN "2001-01-01″and" 2003-12-31″

GROUP by M

The results show that the monthly partition table has 30%-60%, and the 15%-30% performance is improved by the annual partition table.

+ ———-+ ——— –+ ——— –+ ——— –+

| query_id | No points |   Monthly Score | Annual minutes |

+ ———-+ ——— –+ ——— –+ ——— –+

| 1 | 97.779958 | 36.296519 | 82.327554 |

|  2 | 69.61055 |  47.644986 | 47.60223 |

+ ———-+ ——— –+ ——— –+ ——— –+

Processor Factor

When the above test is tested on the home machine (Intel Dual Core 2.3 MHz CPU). For the original for dual Xeon 2.66 MHz, find new server faster!.

Repeating the test above, it's amazing:

+--–+ ——————-+ ————-+ ————— –+

| state |  MyISAM |myisam Month Partition | MyISAM-Year Division |

+--–+ ——————-+ ————-+ ————— –+

| Cold |   0.051063428571429 | 0.6577062 | 1.6663527428571 |

| Warm |   0.063645485714286 | 0.1093724 | 1.2369152285714 |

+--–+ ——————-+ ————-+ ————— –+

MyISAM a table with a primary key is faster than a partitioned table. The partition table behaves the same as it did, but the partitioned table performance improves, making partitioning unnecessary. Now that the server seems to be taking advantage of the benefits of indexing, I have indexed the partitioned columns on the partitioned table.

# Original Table

CREATE TABLE FlightStats (

Airlineid int NOT NULL,

Uniquecarrier char (3) NOT NULL,

Carrier char (3) NOT NULL,

Flightdate date NOT NULL,

Flightnum char (5) NOT NULL,

Tailnum Char (8) NOT NULL,

Arrdelay double NOT NULL,

Arrtime datetime NOT NULL,

Depdelay double NOT NULL,

Deptime datetime NOT NULL,

Origin char (3) NOT NULL,

Dest char (3) NOT NULL,

Distance int NOT NULL,

Cancelled char (1) Default ' n ',

Primary KEY (Flightdate, Airlineid, Carrier, Uniquecarrier, Flightnum, Origin, Deptime, Dest)

)

# partition Table

CREATE TABLE FlightStats (

Airlineid int NOT NULL,

Uniquecarrier char (3) NOT NULL,

Carrier char (3) NOT NULL,

Flightdate date NOT NULL,

Flightnum char (5) NOT NULL,

Tailnum Char (8) NOT NULL,

Arrdelay double NOT NULL,

Arrtime datetime NOT NULL,

Depdelay double NOT NULL,

Deptime datetime NOT NULL,

Origin char (3) NOT NULL,

Dest char (3) NOT NULL,

Distance int NOT NULL,

Cancelled char (1) Default ' n ',

KEY (Flightdate)

)

PARTITION by RANGE ...

The result is satisfying, with 35% performance improvements.

+--–+ ——————-+ ——————-+ ——————-+

| state |  MyISAM |myisam Month Partition | MyISAM-Year Division |

+--–+ ——————-+ ——————-+ ——————-+

| Cold | 0.075289714285714 | 0.025491685714286 | 0.072398542857143 |

| Warm | 0.064401257142857 | 0.031563085714286 | 0.056638085714286 |

+--–+ ——————-+ ——————-+ ——————-+

Conclusion:

1. Using table partitioning is not a guarantee of performance improvement. It depends on the following factors:

    • The column used by the partition used for partitioning;
    • Partition function If the original field is not int;
    • Server speed;
    • The amount of memory.

2. run benchmark and performance tests before applying to production systems

Depending on the purpose of your database, you may get a huge performance boost or you may gain nothing. If you are not careful, you may even be able to degrade performance.

For example, a table that uses a month partition can get an excellent speed when it is always a date range query. However, if there are no date queries, a full table scan is performed.

Partitioning is a key tool for improving the performance of massive data. What is massive data depends on the hardware that is deployed. Using partitions blindly does not guarantee improved performance, but can be a perfect solution with the help of benchmarking and performance testing.

3. the Archive table can be a good compromise

Archive table partitions can achieve significant performance improvements. Of course, depending on your purpose, any query without partitioning is a full table scan. If you have historical data that you don't need to change, and you have to analyze statistics by time, using archive is an excellent choice. It uses the original space of the 10-20% and has better performance than the MYISAM/INNODB table for clustered queries.

Although a good optimized partition MyISAM table performance may be better than the corresponding archive table, it requires 10 times times the space.


Experiment Two:

1. Build two tables, one by Time field partition, one not partition.

CREATE TABLE Part_tab

(

C1 int default NULL,

C2 varchar () default NULL,

C3 Date Default NULL

) Engine=myisam

PARTITION by RANGE (C3) (PARTITION p0 VALUES less THAN (1995),

PARTITION P1 values less THAN (1996), PARTITION P2 values less THAN (1997),

PARTITION P3 Values less THAN (1998), PARTITION P4 values less THAN (1999),

PARTITION P5 values less THAN (Watts), PARTITION P6 values less THAN (2001),

PARTITION P7 values less THAN (2002), PARTITION P8 values less THAN (2003),

PARTITION p9 Values less THAN (in), PARTITION P10 values less THAN (2010),

PARTITION p11 VALUES less THAN MAXVALUE);

CREATE TABLE No_part_tab

(C1 int (one) default NULL,

C2 varchar () default NULL,

C3 date Default NULL) Engine=myisam;

2. Build a stored procedure that uses this process to insert 8 million different data into two tables.

Delimiter//

CREATE PROCEDURE Load_part_tab ()

Begin

Declare v int default 0;

While v < 8000000

Todo

INSERT INTO Part_tab

Values (V, ' testing partitions ', adddate (' 1995-01-01′, (rand (v) *36520) mod 3652));

Set V = v + 1;

End while;

End

//

And then execute

Mysql> delimiter;

Mysql> call Load_part_tab ();

Query OK, 1 row affected (8 min 17.75 sec)

mysql> INSERT INTO No_part_tab select * from Part_tab;

Query OK, 8000000 rows Affected (51.59 sec)

records:8000000 duplicates:0 warnings:0

3. Start with a simple scope query for the data in these two tables. and displays the execution process resolution:

Mysql> Select COUNT (*) from No_part_tab where C3 > Date ' 1995-01-01′and c3 < date ' 1995-12-31′;

+ ———-+

| COUNT (*) |

+ ———-+

| 795181 |

+ ———-+

1 row in Set (38.30 sec)

Mysql> Select COUNT (*) from Part_tab where C3 > Date ' 1995-01-01′and c3 < date ' 1995-12-31′;

+ ———-+

| COUNT (*) |

+ ———-+

| 795181 |

+ ———-+

1 row in Set (3.88 sec)

Mysql> Explain select COUNT (*) from No_part_tab where C3 > Date ' 1995-01-01′and c3 < date ' 1995-12-31′\g

1. Row ***************************

Id:1

Select_type:simple

Table:no_part_tab

Type:all

Possible_keys:null

Key:null

Key_len:null

Ref:null

rows:8000000

Extra:using where

1 row in Set (0.00 sec)

Mysql> Explain partitions select COUNT (*) from Part_tab where

-> c3 > Date ' 1995-01-01′and c3 < date ' 1995-12-31′\g

1. Row ***************************

Id:1

Select_type:simple

Table:part_tab

Partitions:p1

Type:all

Possible_keys:null

Key:null

Key_len:null

Ref:null

rows:798458

Extra:using where

1 row in Set (0.00 sec)

As can be seen from the above results, the use of table partitioning is less than 90% of the response time for partitions. The command resolution explain program can see that only the first partition was scanned during the query of the partitioned table, and the rest skipped. Further testing:

-Increase date Range

Mysql> Select COUNT (*) from No_part_tab where C3 > Date ' -01-01′and c3 < date ' 1997-12-31′;

+ ———-+

| COUNT (*) |

+ ———-+

| 2396524 |

+ ———-+

1 row in Set (5.42 sec)

Mysql> Select COUNT (*) from Part_tab where C3 > Date ' -01-01′and c3 < date ' 1997-12-31′;

+ ———-+

| COUNT (*) |

+ ———-+

| 2396524 |

+ ———-+

1 row in Set (2.63 sec)

– Add an indexed field query

Mysql> Select COUNT (*) from Part_tab where C3 > Date ' -01-01′and c3 < date

' 1996-12-31′and c2= ' hello ';

+ ———-+

| COUNT (*) |

+ ———-+

| 0 |

+ ———-+

1 row in Set (0.75 sec)

Mysql> Select COUNT (*) from No_part_tab where C3 > Date ' -01-01′and c3 < Da

Te ' 1996-12-31′and c2= ' hello ';

+ ———-+

| COUNT (*) |

+ ———-+

| 0 |

+ ———-+

1 row in Set (11.52 sec)

Conclusion :

    • Partition and unpartitioned occupy approximately the same file space (data and index files)
    • Remove unnecessary fields
    • According to the manual, add myisam_max_sort_file_size Increases partition Performance

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.