One instance explains MySQL Query Optimization (3) instance analysis
The following is an example of a manufacturing company to illustrate how to optimize queries. The database of the manufacturing company contains three tables. the mode is as follows:
1. part table
Parts part description other columns
(Part_num) (part_desc) (other column)
102,032 Seageat 30G disk ......
500,049 Novel 10 M network card ......
......
Instance analysis
The following is an example of a manufacturing company to illustrate how to optimize queries. The database of the manufacturing company contains three tables. the mode is as follows:
1. part table
Parts part description other columns
(Part_num) (part_desc) (other column)
102,032 Seageat 30G disk ......
500,049 Novel 10 M network card ......
......
2. vendor table
Vendor ID, vendor name, other columns
(Vendor _ num) (vendor_name) (other column)
910,257 Seageat Corp ......
523,045 IBM Corp ......
......
3. parven table
Part No. manufacturer No. part quantity
(Part_num) (vendor_num) (part_amount)
102,032 910,257 3,450,000
234,423 321,001 4,000,000
......
The following query runs regularly on these tables and generates a report on the quantity of all parts:
SELECT part_desc, vendor_name, part_amount
FROM part, vendor, parven
WHERE part. part_num = parven. part_num
AND parven. vendor_num = vendor. vendor_num
Order by part. part_num
If no index is created, the overhead of the preceding query code will be huge. Therefore, we create an index on the part number and the manufacturer number. Index creation avoids repeated scanning in nesting. Statistics on tables and indexes are as follows:
Table row size number of rows per page number of data pages
(Table) (row size) (Row count) (Rows/Pages) (Data Pages)
Part 150 10,000 25 400
Vendor 150 1,000 25 40
Parven 13 15,000 300 50
Index key size per page key quantity page quantity
(Indexes) (Key Size) (Keys/Page) (Leaf Pages)
Part 4, 500, 20
Vendor 4 500 2
Parven 8 250 60
It seems to be a relatively simple 3-table join, but its query overhead is very high. You can see from the system table that there is a cluster index on part_num and vendor_num, so the index is stored in the physical order. The parven table does not have a specific storage order. The size of these tables indicates that the success rate of non-sequential access from the buffer page is very small. The optimal query plan for this statement is: first read 400 pages from the part sequence, and then access the parven table unordered for 10 thousand times, 2 pages each time (one index page and one data page), a total of 20 thousand disk pages, and 15 thousand non-sequential access to the vendor table, combined with 30 thousand disk pages. It can be seen that the cost of disk access on this index is 50.4 thousand times.
In fact, we can improve the query efficiency by using a temporary table in three steps:
1. read data from the parven table in the order of vendor_num:
SELECT part_num, vendor_num, price
FROM parven
Order by vendor_num
INTO temp pv_by_vn
This statement reads parven (50 pages) sequentially, writes a temporary table (50 pages), and sorts it. Assume that the sorting overhead is 200 pages, which is 300 pages in total.
2. connect the temporary table to the vendor table, output the result to a temporary table, and sort by part_num:
SELECT pv_by_vn, * vendor. vendor_num
FROM pv_by_vn, vendor
WHERE pv_by_vn.vendor_num = vendor. vendor_num
Order by pv_by_vn.part_num
Into tmp pvvn_by_pn
Drop table pv_by_vn
This query reads pv_by_vn (50 pages) and uses indexes to access the vendor table for 15 thousand times. However, due to the vendor_num order, in fact, the vendor table is read in an indexed order (40 + 2 = 42 pages). the output table contains about 95 rows on each page, with a total of 160 pages. Writing and accessing these pages triggers 5*160 = 800 reads and writes, and the index reads and writes 892 pages.
3. connect the output and the part to get the final result:
SELECT pvvn_by_pn. *, part. part_desc
FROM pvvn_by_pn, part
WHERE pvvn_by_pn.part_num = part. part_num
Drop table pvvn_by_pn
In this way, the query reads pvvn_by_pn sequentially (160 pages) and reads the part Table 15 thousand times through the index. as the index is built, 1772 disk reads and writes are actually performed. the optimized ratio is. In Informix Dynamic
When doing the same experiment on Sever, we found that the time consumption optimization ratio is (if the data volume is increased, the proportion may be larger ).
Summary
20% of the code takes 80% of the time, which is a naming law in programming, as well as in database applications. Our optimization focuses on SQL execution efficiency for database applications. The key aspect of query optimization is to make the database server read less data from the disk and read pages in sequence rather than in an unordered manner.