Author: skate
Time: 2013/05/14
MySQL table sharding principles
When using MySQL in large quantities and with high data access, we will introduce the MySQL sharding standards to improve the performance of table sharding.
Environment:
Business type: OLTP
Hardware:
CPU: 8 CPU 2.4 GHz
Mem: 48 gb
Disk: RAID 5 6 × SAS
What types of tables need to be split: whether the table needs to be split is determined based on the table volume, number of tables, and access features.
I. Splitting criteria:
1. The table volume is greater than 2 GB or the number of rows is greater than 1000 W. Data is accessed in simple forms such as a single table primary key. In this case, table sharding is required.
2. The table volume is greater than 2 GB or the number of rows is greater than 500 W. Data is accessed in the form of two tables jion and small-scale queries (the result set is smaller than 100 rows). In this case, table sharding is required.
3. the table volume is larger than 2 GB or the number of rows is greater than 200 W. Data is accessed in complex forms such as multi-table join, range query, order by, group by, and high frequency, especially DML. In this case, table shards are required.
4. The table fields contain large fields such as text and more than varchar (500), and rarely used partition fields are split into Parent and Child tables. This table shard can be used together with the preceding table shards.
5. If the data has the time-out feature, you need to archive the data into sub-tables.
As long as any of the above standards is met, table sharding is required.
Ii. Table sharding method:
1. cold/hot data table sharding: applicable to small-volume traffic and seldom used cold data
1.1 There are many fields in a single table. We will remove frequently used and non-frequently used integer fields or large fields into two tables.
1.2 The table data has a time expiration. The expired data is split into the history table or partitioned by time gradient.
2. Horizontal table sharding: applicable to large traffic volumes
2.1 sharding tables such as hash or other sharding tables that are based on the remainder of a number have the advantage of facilitating data distribution, but cannot be expanded.
2.2 incremental table sharding by primary key ID. For example, a table sharding with IDS is convenient to expand. The disadvantage is that the pressure is uneven.
2.3 Table sharding by date, such as one table sharding every day, every month, or every year, the advantage is convenient scalability and the disadvantage is uneven pressure.
Description
1. How to estimate the table volume
Create Table 'td _ skate '(
'Valid' bigint (20) not null auto_increment comment 'value id ',
'Propertyid' bigint (20) null default null comment' property id ',
'Text' varchar (400) null default null,
'Text' varchar (400) null default null,
'Picurl' varchar (200) null default null comment' the attribute value indicates the image and saves the relative address of the image ',
'Isother 'bigint (20) null default null comment' is the other value, 0 No 1 Yes ',
'Createtime' datetime null default null comment' creation time ',
'Createuser' bigint (20) null default null comment' create user ',
'Lastmodify' datetime null default null comment' last modification time ',
'Updatetimeuser' bigint (20) null default null comment' last modifier ',
'Deletetime' datetime null default null comment' Delete time ',
'Deleteuser' bigint (20) null default null comment' delete ',
'Description' varchar (4000) null default null comment' product description ',
'Isdelete' int (11) null default '0 ',
Primary Key ('valid '),
Index 'fk _ td_prodline_attrval_td_prodline_attr '('propertyid '),
Constraint 'fk _ td_prodline_attrval_td_prodline_attr 'foreign key ('propertyid') References 'td _ prodline_attr' ('propertyid ')
)
Collate = 'utf8 _ general_ci'
Engine = InnoDB
Auto_increment = 2491650;
Add the number of bytes occupied by all fields in the table, and multiply the estimated number of rows by the table size. For example, if the table above is estimated to be 1000 W, then its volume is
(8 + 8 + 400 + 400 + 200 + 8 + 8 + 8 + 8 + 8 + 8 + 8 + 8 + 8 + 4000 + 8) x 10000000 = 50.8G, we can see that the table design is unreasonable and can be modified as follows:
Int replaces bigint
Replace datetime with Timestamp
Status bit isdelete is replaced by tinyint
Whether varchar (4000) can be put into a Word Table based on business characteristics
After optimization, the table size is: (4 + 4 + 400 + 400 + 200 + 4 + 4 + 4 + 4 + 4 + 4 + 4 + 1) × 10000000 = 10.37 GB, to further improve performance, you need to delete Foreign keys and table shards to ensure that a single table is less than 2 GB.
If you want to view the description information and view the sub-table through primary key Association, only valid sub-table information is scanned, and the performance will be greatly improved.
2. It is easy to estimate the number of rows in a table. estimate the number of rows based on business characteristics and visits.
------- End ------