How to create and optimize the index organizational structure in MySQL

Last Update:2018-06-11 Source: Internet

Author: User

Tags add time mysql manual

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

The content of this article complies with the CC copyright agreement and can be reproduced at will. However, the original source and author information of the article and copyright statement URL must be indicated in hyperlink form: www. mysqlops. com20110523mysql % E4

The content of this article complies with the CC copyright agreement, can be reproduced at will, but must be in the form of hyperlink to indicate the original source of the article and the author information and copyright statement URL: http://www.penglixun.com/tech/database/think_about_mysql_create_and_optimize_index.html original link: http://www.mysqlops.com/2011/05/23/mysql%E4

The content of this article complies with the CC copyright agreement, can be reproduced at will, but must be in the form of hyperlink to indicate the original source of the article and author information and copyright statement URL: http://www.penglixun.com/tech/database/think_about_mysql_create_and_optimize_index.html

Link: http://www.mysqlops.com/2011/05/23/mysql%E4%B8%AD%E5%88%9B%E5%BB%BA%E5%8F%8A%E4%BC%98%E5%8C%96%E7%B4%A2%E5%BC%95%E7%BB%84%E7%BB%87%E7%BB%93%E6%9E%84%E7%9A%84%E6%80%9D%E8%B7%AF.html

[Guide]
Analyze how to design the storage structure, how to manipulate the stored data, and how to reduce the operation cost or cost and minimize the system overhead based on the data access requirements in a real production environment. At the same time, let more beginners understand how table indexes stored in data are organized and hope to serve as a reference template.

Test Case Description
The test case is in the B2C field. A product order information table is used to store the purchased items. However, some other fields are removed for testing and the data items in the table are not described in detail, field meanings are shown in the table

USE 'test'; drop table if exists 'test '. 'Goods _ order'; create table 'goods _ order' ('order _ id' int unsigned not null comment 'order No ', 'Goods _ id' int unsigned not null default '0' COMMENT 'item No. ', 'order _ type' tinyint unsigned not null default '0' COMMENT 'order type ', 'order _ status' tinyint unsigned not null default '0' COMMENT 'order status', 'color _ id' smallint unsigned not null default '0' COMMENT 'color id ', 'size _ id' smallint unsigned not null default '0' comment' size id', 'goods _ number' mediumint unsigned not null default '0' comment' qty ', 'bucket _ id' int unsigned not null default '0' COMMENT 'warehouse id', 'packet _ id' int unsigned not null default '0' COMMENT 'bit Code ', 'gmt _ create' timestamp not null default '2017-00-00 00:00:00 'comment' add time ', 'gmt _ modify' timestamp not null default '2017-00-00 00:00:00 'COMMENT 'Update time', primary key (order_id, 'goods _ id ')) ENGINE = InnoDB AUTO_INCREMENT = 1 character set 'utf8' COLLATE 'utf8 _ general_ci ';

Where, the primary key information: primary key (order_id, 'goods _ id'), why the primary key index field order is: order_id, 'goods _ id', instead: what about 'goods _ id' and order_id? The reason is very simple. The repetition rate of goods_id in the order information table is higher than that of order_id, that is, the filtering rate of order_id is higher, which can reduce the number of scanned index records to achieve higher efficiency, the SQL statements to be listed belowIt also tells us that some SQL statementsOnly the order_id field appears in the WHERE clause of the statement. Therefore, we must use the field order_id as the header of the Union primary key index, and 'goods _ id' is the end of the Union primary key index.

Summary of data storage table design:
To design a table structure for data storage, you must first know which data items are included, that is, the data streams that are often used in the row, and the attributes of each data item, for example, the storage data type, value range and length, data integrity and other requirements, so as to determine the attribute definition of the data item. After the stored data item information is determined, perform the following three steps:
L first, determine which data items or combinations can be used as the unique identifier of the record;
L second, determine the operations on data records, the frequency of each operation, and website and other types of applications, and distinguish between foreground and background operations, that is, the operations performed by external users, or internal user operations;
L finally, analyze the filtering rate of data items as the condition of data record operations, that is, the proportion of different values of data items to the total number of data records, the closer the ratio is to 1, the better the screening rate and the value distribution rate;
To sum up, let the data modification operation take precedence over the read-only operation, you can create an index structure that meets the requirements and has good performance.
The Design of Data Access involves a very important piece of knowledge: the basic knowledge of relational databases and the paradigm of relational data theory. It is recommended that the differences between 1NF, 2NF, 3NF, and BCNF be learned from the knowledge points of the paradigm so far. The problems and defects to be circumvented must be clear, however, in a real work environment, do not rely on the paradigm for any access design. In a Buddhist sentence, the expression is accurate: NULL is color, and color is empty.

Stored Procedure Code used to generate Test Data
Creating an index is inseparable from the real data stored in a table. Therefore, writing a stored procedure may simulate data in a real production environment, which is also convenient for you to use, in your own testing environment, verify your experience,
Stored Procedure Code:

DELIMITER $$DROP PROCEDURE IF EXISTS `usp_make_data` $$CREATE PROCEDURE `usp_make_data`()BEGINDECLARE iv_goods_id INT UNSIGNED DEFAULT 0;DECLARE iv_depot_id INT UNSIGNED DEFAULT 0;DECLARE iv_packet_id INT UNSIGNED DEFAULT 0; SET iv_goods_id=5000;SET iv_depot_id=10;SET iv_packet_id=20; WHILE iv_goods_id>0DOSTART  TRANSACTION;WHILE iv_depot_id>0DOWHILE iv_packet_id>0DOINSERT INTO goods_order(order_id,goods_id,order_type,order_status,color_id,size_id,goods_number,depot_id,packet_id,gmt_create,gmt_modify)VALUES(SUBSTRING(RAND(),3,8),iv_goods_id,SUBSTRING(RAND(),3,1),SUBSTRING(RAND(),5,1)%2,SUBSTRING(RAND(),3,3),SUBSTRING(RAND(),4,3),SUBSTRING(RAND(),5,2),iv_depot_id,SUBSTRING(RAND(),4,2)*iv_packet_id,DATE_ADD(NOW(),INTERVAL -SUBSTRING(RAND(),2,3) DAY),DATE_ADD(NOW(),INTERVAL -SUBSTRING(RAND(),3,2) DAY));SET iv_packet_id=iv_packet_id-1;END WHILE;SET iv_packet_id=20;SET iv_depot_id=iv_depot_id-1;END WHILE ; COMMIT;SET iv_depot_id=10;SET iv_goods_id=iv_goods_id-1;END WHILE ;END $$DELIMITER ;

Business Logic description
L non-registered users, or registered users of the website do not log on, you can purchase optional items, the user UID corresponding to the generated order number is the default;
L orders are associated with user UID, descriptions, and other information. Other tables are stored and associated using the order number mode;
L user's order information can be modified before payment is made, but cannot be modified after payment;
L paid order information is automatically sent to the Logistics Department for subsequent operations. After processing, the storage location information of the items involved in the order will be updated;
L regularly reads part of the data to the Data Warehouse analysis system for statistical analysis;
L individual order query, with both the frontend and backend servers;
L query and display shopping records;

Use the SQL statement to manipulate data according to the Business Rule DescriptionStatement
(1). explain select * FROM goods_order WHERE 'order _ id' = 40918986;
(2). SELECT * FROM goods_order WHERE 'order _ id' IN (40918986,40717328, 30923040 ...) Order by gmt_modify DESC;
(3). UPDATE goods_order SET gmt_modify = NOW (),.... WHERE 'order _ id' = 40717328 AND goods_id = 4248;
(4). select count (*) FROM goods_order WHERE depot_id = 0 order by gmt_modify desc limit 0, 50;
(5). SELECT * FROM goods_order WHERE depot_id = 6 AND packet_id = 0 order by gmt_modify desc limit 0, 50;
(6). select count (*) FROM goods_order WHERE goods_id = 4248 AND order_status = 0 AND order_type = 1
(7). SELECT * FROM goods_order WHERE goods_id = 4248 AND order_status = 0 AND order_type = 1 order by gmt_modify desc limit;
(8). SELECT * FROM goods_order WHERE gmt_modify> = '04-06-06 ';
8 SQL statementsThe statement is categorized by the user that triggers the execution:
L The SQL statement that will be executed when the foreground user clicks the triggered operationStatement: (1), (2), (3 );
L The SQL statement that will be executed when the user clicks the operation triggered in the backgroundStatement: (1), (2), (3), (4), (5), (6), (7 );
L The background system automatically executes the tasks on a regular basis: (4), (5), (6), and (7). The tasks are executed every 15 minutes during normal working hours, to check whether there are orders that have been paid but not prepared for the goods, whether there are orders that have been collected but not delivered;
L SQL statement executed by the statistical analysis system to export data on a regular basisThe statement is: (8). The frequency is once every 24 hours;
Analyze the SQL statements listed above, Divided into two categories, one is read Operations SQL(Note: SELECT Operation). The other type is Modification Operation (Note: UPDATE and DELETE operation), which are as follows:
Fields in the WHERE clause, group by clause, order by clause, and HAVING clause of SELECT:
(1). order_id
(2). order_id + gmt_modify
(3). depot_id + gmt_modify
(4). depot_id + packet_id + gmt_modify
(5). goods_id + order_status + order_type
(6). goods_id + order_status + order_type + gmt_modify
(7). gmt_modify
Condition fields in the WHERE clause of the modification operation:
(8). order_id + goods_id

We already have a primary key index: primary key (order_id, 'goods _ id'). In addition, considering that data operations in this table are dominated by SELECT and INSERTSecond, according to the preceding SQLFor this reason, we can preliminarily determine the index to be created:
Alter table goods_order
Add index idx_goodsID_orderType_orderStatus_gmtmodify (goods_id, order_type, order_status, gmt_modify ),
Add index idx_depotID_packetID_gmtmodify (depot_id, packet_id, gmt_modify );

Summary:
The article also analyzes why the order of the primary key index combination is order_id, 'goods _ id', and then supplements other features of the field attribute that serves as the primary key joint index: the field value does not change after it is written, the field value length is short, and it is best to be a numerical value;
For numbered SQL(8) The data read operation is performed every day based on the update date. The full table scan method is used to sacrifice the data read performance, to reduce the index maintenance overhead caused by updating the value of the field modification date;
For numbered SQL(4) (5) Considering that each time we read the latest 50 records and the data we read is basically hot data, we have to sacrifice one SQLTo reduce the I/O workload of maintaining index fields;
For numbered SQL(6), (7), create a joint index. Pay special attention to the field order in the joint index: idx_goodsID_orderType_orderStatus_gmtmodify (goods_id, order_type, order_status, gmt_modify), where:
L The filtering rate of the goods_id field is higher than that of order_type and order_status. In addition, the gmt_modify field only appears in the order by clause. Therefore, only the goods_id field is used as the header of the Union Index to improve the filtering rate of the index, this improves index efficiency and reduces logical or physical reads.
L The order_status field has only 0 or 1 values, while there are many order_types, and according to the SQLThe position of order_type in the Union must be closer to the header than that of order_status;
L The gmt_modify field appears in the order by clause. Therefore, it must be placed at the end of the Union Index field;

Finally, sort out the requirements, design the storage structure, and write the SQLAnd create the index structure, we should do the steps:
L sort the data streams generated by the business and read the data;
L clear the attribute information of each data item in the data stream;
L analyze business indicators and estimate the size of data to be stored (Remarks: The amount of GB requiredAs capacity unit );
L select the hardware devices and database architecture that may be used to support services;
L clear all the conditions and operation types that may manipulate data;
L analyze the data filtering rates of the Data condition fields;
L weigh each SQLPerformance and I/O volume, that is, similar to which operation has a higher weight and those operations have a lower weight;
L create an index organizational structure;
L collect feedback from testing and production environments, and optimize the index structure;

Note:
I wanted to run a simulated test script program in combination with the test environment to make it more intuitive to see the same SQL statement under different index organizations.Operation and frequency, the processing capability of the database server, and the load changes and comparison information. Unfortunately, the only server cannot be used, so you have to give up. Analyze the same SQL statementWithout passing through the index, the logical IO and physical IO required by the index is also a way. This time, we will not analyze it. If you need it, you can play with it, in addition, it is recommended that beginners carefully read the relevant chapters in the mysql Manual: 7.2.6. index Merge Optimization.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More