An optimization process and other related query SQL statements

Last Update:2017-05-12 Source: Internet

Author: User

Tags mysql manual

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

As described in the previous blog posts, the instance information after the process is completed can be queried through a unified entry, that is, advanced query (an excel file can be exported or an interface for generating various reports is reserved. However, we also provide a dedicated query module for some special workflows, such as turning positive, leaving, and attendance. For example, the resignation module described in this article is divided into three sub-modules: New resignation information, resigned upon approval, and terminated resignation. The new resignation information function is mainly for passive resignation, that is, new resignation information for the organization to persuade, dismiss, or unilaterally cancel the contract. once such resignation is saved, it can be deemed that the resignation has ended, therefore, unlike the separation query logic in the approval process, it is clear that multiple tables need to be associated to query the termination process. During the test in the test system, we found that the SQL statement for the query of terminated termination was executed directly. when the data volume is 17 records, it is about 1 s, which is actually slow but acceptable. After the system was officially launched, there were about 400 outgoing data records. The user simply timed out on the front end, and it took about 10 seconds to wait. the user experience was already poor. Take out the query SQL as follows:

SELECT *FROM (SELECT DISTINCT leaveinfo.id, f_sqrgh, f_sqrbm, f_sqr, f_sqbmbm, f_sqbm, f_lxdhfj, f_sjhm, f_sqrq, f_rzrq, f_ndlzrq, f_qrlzrq, f_zw, f_gw, f_gwlx, f_gwcj, f_szdq, f_gzdd, f_lzyy, f_lzyyzs, f_yggxbmtjl, f_lzlx, f_inputtype, belongCompany, postDirection, techLevel, idCard, staffinfo.sex, staffinfo.birthday, exec.id AS 'processExecutionId', exec.status AS 'processExecutionStatus', exec.formDefineId, exec.processDefineId, exec.processInstanceId, exec.tableName, process.`name` AS 'processDefineName'FROM T_DYMC_20140625100255 leaveinfo LEFT JOIN t_per_staffinfo staffinfo ON staffinfo.staffId = leaveinfo.f_sqrgh LEFT JOIN t_bpm_process_execution exec ON exec.pkValue = leaveinfo.id LEFT JOIN t_bpm_process_define process ON process.id = exec.processDefineIdWHERE leaveinfo.f_sqrgh = staffinfo.staffIdAND (exec.`status` = 2AND leaveinfo.f_inputtype = 'FLOW'OR leaveinfo.f_inputtype = 'MANUAL')) allData LEFT JOIN t_sys_user sysUser ON allData.f_sqrgh = sysUser.staffId

This is a paging query. the number of all results is displayed as follows:

SELECT COUNT(DISTINCT allData.id)FROM (SELECT DISTINCT leaveinfo.id, leaveinfo.f_sqrghFROM T_DYMC_20140625100255 leaveinfo LEFT JOIN t_per_staffinfo staffinfo ON staffinfo.staffId = leaveinfo.f_sqrgh LEFT JOIN t_bpm_process_execution exec ON exec.pkValue = leaveinfo.id LEFT JOIN t_bpm_process_define process ON process.id = exec.processDefineIdWHERE leaveinfo.f_sqrgh = staffinfo.staffIdAND (exec.`status` = 2AND leaveinfo.f_inputtype = 'FLOW'OR leaveinfo.f_inputtype = 'MANUAL')) allData LEFT JOIN t_sys_user sysUser ON allData.f_sqrgh = sysUser.staffId

In the test system, we tested the two SQL statements with 17 data records, which took less than seconds. However, in the formal system, when there are 398 data records in the test, the first execution time is about 9.313 s, and the second takes about 4.341 s.

Obviously, the query of 398 pieces of data only exceeds the user's endurance by 10 s, which greatly affects the system performance and compromises the user experience.

First, let's sort out the SQL statement. taking the first one as an example, we have joined multiple tables to query, and whether the multiple tables are necessary and there is a possibility of logical optimization.

The primary table we query is the resignation information table, which is associated with three tables: archive, running, and process definition. at last, we added the data permission restriction previously proposed and associated with the user table. We want to query the information of the departing employee through the associated file, and the associated running table to query the information of the current processor and the current processing stage, the process definition table is the name of the process definition to be queried. After analysis, we first found that this SQL was copied by engineers from the advanced query, because the advanced query is applied to all processes, and the process name needs to be queried through processDefineId, while our departure query, it is the query exit process and does not need to be associated with a table for query. We remove this association and directly return the "resignation process" as processDefineName.

With this association removed, the SQL efficiency has been improved, but the improvement is not obvious. From the logic point of view, we have no room for optimization. Therefore, we hope to optimize the database technology. Before proceeding with optimization, let's take a look at the optimization technology that has been used by the current statement (for non-professional DBAs, index is the first optimization that can be thought ), in mysql, the explicit statement is provided to query how mysql uses indexes to process select statements and connect tables. Next, let's take a look at the optimization techniques used in the query statement before optimization. Before optimization, we have an index for the staffid of the two tables of the archive and the user. the SQL statement for querying the index is as follows:

show index from t_per_staffinfo

For example:

650) this. width = 650; "src =" http://www.68idc.cn/help/uploads/allimg/151111/12151G336-0.jpg "title =" indexfromstaffinfo.jpg "alt =" wKioL1UGsNrTFuaTAAC-1t1e3sA450.jpg "/>

And user table indexes:

650) this. width = 650; "src =" http://www.68idc.cn/help/uploads/allimg/151111/12151M4N-1.jpg "title =" userindex.jpg "alt =" inline "/> The query statement contains two tables: t_bpm_process_define and t_bpm_process_execution, we want to improve the query efficiency after creating an index for it:

ALTER TABLE t_bpm_process_execution ADD INDEX pkValue_index (pkValue);

Similarly, we added indexes for status and t_bpm_process_define.

Now let's use explain to look at our current query statements, such:

650) this. width = 650; "src =" http://www.68idc.cn/help/uploads/allimg/151111/12151HU9-2.jpg "title =" indexed.jpg "alt =" wKiom1UGsbrSJiaDAAG_GOGJWuI014.jpg "/> Based on the meaning of each column in the information, as the name suggests, let's take a look, table indicates the name and type of the table to be queried. the connection types from good to bad are const, eq_reg, ref, range, index, and all) possible_key indicates the indexes that may be used in this table, and key indicates the indexes actually used in this query. (if the value is null, no index is used, mysql uses unoptimized indexes in rare cases, but you can also use using idex to forcibly use indexes.) key_len indicates the index length (without compromising accuracy ). The shorter the length, the better) and ref indicates which column uses the index, rows is the length of data returned by the request that MySQL considers to need to be checked, and Extra indicates Extra information about the resolution query. By analyzing Extra, we can see which indexes need to be optimized and how to optimize them.

Extra values include Distinct, Not Exist, Range cheched for each record, using filesort, using temporary, Using index, where used, system, const, eq_ref, ref, index, and all. When using filesort (requires additional steps for sorting) and using temporary (requires temporary tables to store intermediate results) appear, the query needs to be optimized.

We can see from the figure that some indexes still need to be further optimized, but our query speed has been reduced from nearly 10 s to 0.088 s. This optimization is successful for non-professional DBAs. This is the end of optimization. I will further communicate with the DBA about the further optimization of using temporary to final the optimization.

650) this. width = 650; "src =" http://www.68idc.cn/help/uploads/allimg/151111/12151K350-3.jpg "title =" time.png "alt =" wkiol1ugs0tra3x1aavx7w%eay259.jpg "/>

Next, let's talk about the basic theory of query, the improvement of index and the basic knowledge of index.

MySQL manual (7.2.1) describes the query mechanism of MySQL as follows:

The tables are listed in the output in the order that MySQL wocould read them while processing the query. mySQL resolves all joins using a single-sweep multi-join method. this means that MySQL reads a row from the first table, then finds a matching row in the second table, then in the third table, and so on. when all tables are processed, MySQL outputs the selected columns and backtracks through the table list until a table is found for which there are more matching rows. the next row is read from this table and the process continues with the next table.

Let's start with the third sentence: Mysql reads the first row of data from the first table, searches for matching rows in the second table, searches for the third table, and so on. After all the tables are processed, Mysql outputs the selected columns and traces back the list of tables until the tables that can match more rows are displayed. Read the next row from this table and continue to query the next table. The key to this Associated query process is to query the content of the current table from the previous table.

After learning how to query the current table from the previous table, we create an index to show MySQL how to directly query the data of the next table, and how to join the next table in the order required.

This article also introduces the statements for viewing and creating indexes. For more information about other operations, see basic indexing knowledge.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More