Discuz hot posts paging database mysql optimization

Source: Internet
Author: User

In our environment, the MySQL-5.6.6 version is used.

When viewing a post and turning pages, an SQL statement similar to the following will be generated:

Mysql> desc SELECT * FROM pre_forum_post WHERE
Tid = 8201301 AND 'invisable' IN ('0', '-2') order by dateline desc limit 15 \ G
* *************************** 1. row ***************************
Id: 1
Select_type: SIMPLE
Table: pre_forum_post
Type: ref
Possible_keys: tid, displayorder, first
Key: displayorder
Key_len: 3
Ref: const
Rows: 593371
Extra: Using index condition; Using where; Using filesort

The cost of executing this SQL statement is:

-- According to the number of index access Row Records, the overall status is better.
| Handler_read_key | 16 |

-- The number of times that the next row of records is accessed according to the index order, usually because of the index range scan or full index scan, which is generally a relatively good status.
| Handler_read_next | 329881 |

-- The total number of times a row record is read in a certain order. If you want to sort the results, the value is usually relatively large. This value is also relatively large when an index cannot be used in a full table scan or multi-table join operation.
| Handler_read_rnd | 15 |

When you encounter a hot post that needs to go back to many pages, for example:

Mysql> desc SELECT * FROM pre_forum_post WHERE
Tid = 8201301 AND 'invisable' IN ('0', '-2') order by dateline LIMIT 129860, 15 \ G
* *************************** 1. row ***************************
Id: 1
Select_type: SIMPLE
Table: pre_forum_post
Type: ref
Possible_keys: displayorder
Key: displayorder
Key_len: 3
Ref: const
Rows: 593371
Extra: Using where; Using filesort

The execution cost of this SQL statement is changed to (Handler_read_key and Handler_read_rnd are much larger ):

| Handler_read_key | 129876 | -- because many rows need to be skipped before
| Handler_read_next | 329881 | -- same as above
| Handler_read_rnd | 129875 | -- sort a large result set first.

It can be seen that when encountering hot posts, the cost of this SQL will be very high. If this hot Post is frequently accessed by a large number of historical replies, or when the search engine repeatedly requests and replies to history pages, it is easy to crush the database server directly.

Summary: The reason why the 'displayorder' index sorting cannot be used in this SQL statement is that the second index column 'invisable' adopts RANGE query ), as a result, the union index cannot be used to sort the 'Dateline 'field (if it is WHERE tid =? AND invisible IN (?, ?) AND dateline =? In this case, the entire joint index can be used completely. Pay attention to the difference between the two ).

After knowing the cause, the corresponding optimization solution is clear:
Create a new index idx_tid_dateline, which only includes the tid and dateline columns (according to the statistics of other indexes, the base of item_type and item_id is too low, so it is not included in the Union index. Of course, you can also consider adding ).

Let's take a look at the execution plan after the new index is used:

Mysql> desc SELECT * FROM pre_forum_post WHERE
Tid = 8201301 AND 'invisable' IN ('0', '-2') order by dateline LIMIT 15 \ G
* *************************** 1. row ***************************
Id: 1
Select_type: SIMPLE
Table: pre_forum_post
Type: ref
Possible_keys: tid, displayorder, first, idx_tid_dateline
Key: idx_tid_dateline
Key_len: 3
Ref: const
Rows: 703892
Extra: Using where

As you can see, the existing Using filesort disappears and can be sorted directly through the index.

However, if this hot Post turns to an older history reply, the corresponding SQL still cannot use the new index:

Mysql> desc SELECT * FROM pre_forum_post WHERE
Tid = 8201301 AND 'invisable' IN ('0', '-2') order by dateline LIMIT 129860,15 \ G
* *************************** 1. row ***************************
Id: 1
Select_type: SIMPLE
Table: pre_forum_post
Type: ref
Possible_keys: tid, displayorder, first, idx_tid_dateline
Key: displayorder
Key_len: 3
Ref: const
Rows: 593371
Extra: Using where; Using filesort

Compare the execution plan of the optimizer if it is recommended to use a new index:

Mysql> desc SELECT * FROM pre_forum_post use index (idx_tid_dateline) WHERE
Tid = 8201301 AND 'invisable' IN ('0', '-2') order by dateline LIMIT 129860,15 \ G
* *************************** 1. row ***************************
Id: 1
Select_type: SIMPLE
Table: pre_forum_post
Type: ref
Possible_keys: idx_tid_dateline
Key: idx_tid_dateline
Key_len: 3
Ref: const
Rows: 703892
Extra: Using where

As you can see, because the query optimizer thinks that the number of rows to be scanned by the latter is more than 0.11 million, the former is more efficient.

In fact, in this example, sorting is more costly, so we need to prioritize sorting, so we should force the use of new indexes, that is, using the subsequent execution plan, specify the index in the corresponding program.

Finally, let's take a look at the profiling statistics comparison of the two execution plans when the hot posts turn to very old history replies:

1. Use the old Index (displayorder ):

Mysql> SELECT * FROM pre_forum_post WHERE
Tid = 8201301 AND 'invisable' IN ('0', '-2') order by dateline LIMIT 129860,15;

# View profiling results
| Startling | 0.020203 |
| Check permissions | 0.000026 |
| Opening tables | 0.000036 |
| Init | 1, 0.000099 |
| System lock | 0.000092 |
| Optimizing | 0.000038 |
| Statistics | 0.000123 |
| Preparing | 0.000043 |
| Sorting result | 0.000025 |
| Executing | 0.000023 |
| Sendingdata | 0.000045 |
| Creating sort indexes | 0.941434 |
| End| 0.000077 |
| Query end | 0.000044 |
| Closing tables | 0.000038 |
| Freeing items | 0.000056 |
| Cleaningup | 0.000040 |

2. If a new index (idx_tid_dateline) is used ):

Mysql> SELECT * FROM pre_forum_post use index (idx_tid_dateline) WHERE
Tid = 8201301 AND 'invisable' IN ('0', '-2') order by dateline LIMIT 129860,15;

# View profiling results
| Startling | 0.000151 |
| Check permissions | 0.000033 |
| Opening tables | 0.000040 |
| Init | 1, 0.000105 |
| System lock | 0.000044 |
| Optimizing | 0.000038 |
| Statistics | 0.000188 |
| Preparing | 0.000044 |
| Sorting result | 0.000024 |
| Executing | 0.000023 |
| Sendingdata | 0.917035 |
| End| 0.000074 |
| Query end | 0.000030 |
| Closing tables | 0.000036 |
| Freeing items | 0.000049 |
| Cleaningup | 0.000032 |

As you can see, the efficiency has been improved, but it is not obvious, because it does need to scan a larger amount of data, so the Sending data stage takes more time.

Then you can rewrite the SQL statement as follows:

Mysql> explain select * FROM pre_forum_post t1 inner join (
SELECT id FROM pre_forum_post use index (idx_tid_dateline) WHERE
Tid = 8201301 AND 'invisable' IN ('0', '-2') ORDER
Dateline LIMIT 129860,15) t2
USING (id) \ G
* *************************** 1. row ***************************
Id: 1
Select_type: PRIMARY
Table:
Type: ALL
Possible_keys: NULL
Key: NULL
Key_len: NULL
Ref: NULL
Rows: 129875
Extra: NULL
* *************************** 2. row ***************************
Id: 1
Select_type: PRIMARY
Table: t1
Type: eq_ref
Possible_keys: PRIMARY
Key: PRIMARY
Key_len: 4
Ref: t2.id
Rows: 1
Extra: NULL
* *************************** 3. row ***************************
Id: 2
Select_type: DERIVED
Table: pre_forum_post
Type: ref
Possible_keys: idx_tid_dateline
Key: idx_tid_dateline
Key_len: 3
Ref: const
Rows: 703892
Extra: Using where

Let's take a look at the profiling statistics of this SQL statement:

| Startling | 0.000209 |
| Check permissions | 0.000026 |
| Check permissions | 0.000026 |
| Opening tables | 0.000101 |
| Init | 1, 0.000062 |
| System lock | 0.000049 |
| Optimizing | 0.000025 |
| Optimizing | 0.000037 |
| Statistics | 0.000106 |
| Preparing | 0.000059 |
| Sorting result | 0.000039 |
| Statistics | 0.000048 |
| Preparing | 0.000032 |
| Executing | 0.000036 |
| Sendingdata | 0.000045 |
| Executing | 0.000023 |
| Sendingdata | 0.225356 |
| End| 0.000067 |
| Query end | 0.000028 |
| Closing tables | 0.000023 |
| Removing tmp table | 1, 0.000029 |
| Closing tables | 0.000044 |
| Freeing items | 0.000048 |
| Cleaningup | 0.000037 |

As you can see, the efficiency has been improved by more than one time, which is quite good.

Finally, this problem will only appear when the hot posts are turning pages. Generally, there is no problem if only the posts replied on pages 1 and 2 still adopt the original execution plan.

Therefore, we recommend discuz! Officially modify or add a new index, and in the code to determine whether the hot posts flip pages, if yes, force the use of new indexes to avoid performance problems.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.