How can I use multi-thread LRU refresh to break through the MySQL bottleneck ?, Lrumysql

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

How can I use multi-thread LRU refresh to break through the MySQL bottleneck ?, Lrumysql

Guide

Translation Team:Zhishutang Zang Jingge project-tianyge

Team members:Tianyi Ge-Leng Feng, Tianyi Ge-JACK, Tianyi Ge-Oracle

School draft:Ye Shifu

SourcePercona Server 5.7: multi-threaded LRU flushing, see: https://www.percona.com/blog/2016/05/05/percona-server-5-7-multi-threaded-lru-flushing/

Author:Laurynas Biveinis and Alexey Stroganov

In this blog post, we'll discuss how to use multi-threaded LRU flushing to prevent bottlenecks in MySQL.

In this article, we will discuss how to use multi-thread LRU refresh to break through the MySQL bottleneck.

In the previous post, we saw that InnoDB 5.7 performs a lot of single-page LRU flushes, which in turn are serialized by the shared doublewrite buffer. based on our 5.6 experience we have decided to attack the single-page flush issue first.

In the previous article (《MySQL 5.7: initial flushing analysis and why Performance Schema data is incomplete",For more information, see the end of this Article.) We can see that InnoDB 5.7 executes a large number of single-page LRU refresh operations. However, in the shared doublewrite buffer stage, it has become a serial mode. Based on our 5.6 experience, we decided to challenge the single-page refresh issue first.

Let's start with describing a single-page flush. if the working set of a database instance is bigger than the available buffer pool, existing data pages will have to be evicted or flushed (and then evicted) to make room for queries reading in new pages. innoDB tries to anticipate this by maintaining a list of free pages per buffer pool instance;

These are the pages that can be immediately used for placing the newly-read data pages. the target length of the free page list is governed by the innodb_lru_scan_depth parameter, and the cleaner threads are tasked with refilling this list by refreshing LRU batch flushing. if for some reason the free page demand exceeds the cleaner thread flushing capability, the server might find itself with an empty free list. in an attempt to not stall the query thread asking for a free page, it will then execute a single-page LRU flush (buf_LRU_get_free_block calling buf_flush_single_page_from_LRU in the source code ), which is already med in the context of the query thread itself.

First, we will describe the concept of single-page refresh. If the database is working (active) and the dataset is larger than the available buffer pool, the existing data pages must be cleaned up or flushed (and then cleared) to free up idle pages for queries. InnoDB tries to maintain the idle page list of each buffer pool instance.

These pages can be immediately used to place newly read data pages. The length of its page list is controlled by the innodb_lru_scan_depth parameter, and the cleanup thread fills the list by executing LRU batch refresh. If for some reason the idle page request exceeds the processing capability of the cleanup thread, InnoDB needs to find an idle list. In order not to block the query thread from requesting an idle page, it will execute a single-page LRU refresh (calling buf_flush_single_page_from_LRU in buf_LRU_get_free_block), which is executed in the context of the query thread.

The problem with this flushing mode is that it will iterate over the LRU list of a buffer pool instance, while holding the buffer pool mutex in InnoDB (or the finer-grained LRU list mutex in XtraDB ). thus, a server whose cleaner threads are not able to keep up with the LRU flushing demand will have further increased mutex pressure-which can further contribute to the cleaner thread troubles. finally, once the single-page flusher finds a page to flush it might have trouble in getting a free doublewrite buffer slot (as shown previusly ). that suggested to us that single-page LRU flushes are never a good idea. the flame graph below demonstrates this:

The problem with this refresh mode is that it traverses the LRU list of each buffer pools instance and holds the buffer pool mutex (or the finer-grained LRU list mutex in XtraDB ). Therefore, when the cleanup thread cannot keep up with the LRU refresh requirement, it will further increase the mutex contention pressure-this will bring more trouble to the cleanup thread. Finally, once a page is refreshed, you can find a page to refresh it. It still encounters problems in obtaining the idle doublewrite buffer slot (as described above. This tells us that single-page refresh is not a good solution. The following flame diagram shows everything:

Note how a big part of the server run time is attributed to a flame rooted at JOIN: optimize, whose run time in turn is almost fully taken by buf_dblwr_write_single_page in two branches.

Note that JOIN: optimize occupies a large amount of time, and its running time is almost entirely completed by buf_dblwr_write_single_page in two branches.

The easiest way not to avoid a single-page flush is, well, simply not to do it! Wait until a cleaner thread finally provides some free pages for the query thread to use. this is what we did in XtraDB 5.6 with the innodb_empty_free_list_algorithm server option (which has a "backoff" default ). this is also present in XtraDB 5.7, and resolves the issues of increased contentions for the buffer pool (LRU list) mutex and doublewrite buffer single-page flush slots. this approach handles the empty free page list better.

The simplest way to avoid refreshing a single page is to stop doing it! The query thread can wait patiently until the thread to be cleared finally provides some idle pages. Therefore, we added the innodb_empty_free_list_algorithm option in XtraDB 5.6 ("backoff" by default "). This parameter also exists in XtraDB 5.7, and solves the contention problem of mutex and doublewrite buffer single-page refresh slot in the buffer pool (LRU list. This method makes it easier to process the empty free page list.

Even with this strategy it's still a bad situation to be in, as it causes query stils when page cleaner threads aren't able to keep up with the free page demand. to understand why this happens, let's look into a simplified scheme of InnoDB 5.7 multi-threaded LRU flushing:

Even if this policy is adopted, another bad scenario is that when the page cleaner thread cannot keep up with the demand of idle pages, it will cause query blocking. To understand why, let's take a look at the simple structure of the multi-threaded LRU refresh in InnoDB 5.7:

The key takeaway from the picture is that LRU batch flushing does not necessarily happen when it's needed the most. all buffer pool instances have their LRU lists flushed first (for free pages), and flush lists flushed second (for checkpoint age and buffer pool dirty page percentage targets ). if the flush list flush is in progress, LRU flushing will have to wait until the next iteration.

Further, all flushing is synchronized once per second-long iteration by the coordinator thread waiting for everything to complete. this one second mark may well become a thirty or more second mark if one of the workers is stalled (with the telltale sign: "InnoDB: page_cleaner: 1000 ms intended loop took 49812ms ") in the server error log. so if we have a very hot buffer pool instance, everything else will have to wait for it. and it's long been known that buffer pool instances are not used uniformly (some are hotter and some are colder ).

Check that the batch refresh of LRU does not necessarily happen when it is most needed. All buffer pool instances first refresh the LRU list (in order to release the data page), and then execute the flush list refresh (to achieve the checkpointh age and buffer pool dirty page percentage conditions ). If the flush list refresh is in progress, the LRU refresh will have to wait until the next refresh.

In addition, all refreshes are synchronized and iterated by the Coordinator thread every second until the synchronization is completed. In the error log, if a worker thread stops, a 1-second mark may be changed to 30 seconds or more (for example, the prompt "InnoDB: page_cleaner: 1000 ms intended loop took 49812 ms "). So if we have a busy buffer pool instance, all must wait. Based on the experience of the old driver, the buffer pool instances are not always static (sometimes busy or sometimes idle ).

A fix shocould:

Decouple the "LRU list flushing" from "flush list flushing" so that the two can happen in parallel if needed.
Recognize that different buffer pool instances require different amounts of flushing, and remove the synchronization between the instances.

We recommend that you fix the problem as follows:

Decouple "LRU list refresh" and "flush list refresh" so that the two can be executed in parallel as needed.
Realize that different buffer pool instances require different refreshes and delete synchronization between instances.

We developed a design based on the above criteria, where each buffer pool instance has its own private LRU flusher thread. that thread monitors the free list length of its instance, flushes, and sleeps until the next free list length check. the sleep time is adjusted depending on the free list length: thus a hot instance LRU flusher may not sleep at all in order to keep up with the demand, while a cold instance flusher might only wake up once per second for a short check.

To sum up, we allocate an independent LRU refresh thread for each buffer pool instance. This thread monitors the idle list length in the instance, refreshes it, and sleep waits for the next idle list length check. The hibernation time is adjusted based on the length of the idle list. Therefore, to keep up with the demand, a busy buffer pool instance may not sleep at all, and a idle buffer pool instance may wake up every second after refreshing, perform a short check.

The LRU flushing scheme now looks as follows:

Now, the LRU refresh structure is as follows:

This has been implemented in the Percona Server 5.7.10-3 RC release, and this design the simplified the code as well. LRU flushing heuristics are simple, and any LRU flushing is now removed from the legacy cleaner coordinator/worker threads-enabling more efficient flush list flushing as well. LRU flusher threads are the only threads that can flush a given buffer pool instance, enabling further simplification: for example, InnoDB recovery writer threads simply disappear.

This has been implemented in Percona Server 5.7.10-3 RC and can simplify the code. The LRU heuristic refresh design is relatively simple. Now, any LRU refresh is removed from the previous cleanup coordinator/worker thread-implements a more effective flush list refresh. The LRU refresh thread is the only thread that can refresh a given buffer pool. It can also be further simplified: for example, canceling InnoDB to restore the write thread.

Are we done then? No. With the single-page flushes and single-page flush doublewrite bottleneck gone, we hit the doublewrite buffer again. We'll cover that in the next post.

So we're done? No! As the bottleneck of single-page refresh and single-page refresh doublewrite disappears, we will switch to doublewrite buffer again. See the next article. Bye bye!

Additional reading:(Tian yige @ zhishutang same series of translations)

1,What performance improvements does Percona Server 5.7 have?

2,Analysis on the Causes of incomplete P_S data

3,Percona Server 5.7 parallel doublewrite feature

Scan the QR code to join the zhishutang technical exchange QQ Group

(Group Number:579036588)

For more information about the courses, visit the group.

Zhishutang

Ye Jinrong and Wu Bingxi jointly build

Leading IT elite training

Industry senior experts work together to customize

MySQL practice/MySQL optimization/Python/SQL Optimization

Several excellent courses

Follow the Technology Development Trend and regularly optimize training teaching plans

Integrates a large number of production cases to meet Enterprises' first-line needs

Community companion learning, one registration, three courses available

Required Courses for DBA and development engineers

Thousands of students have turned around, doubling their salary and improving their positions

The change happened quietly. What are you waiting?

Scan QR code to downloadPreview Video of zhishutang high-quality courses

Or click "read original ".

(MySQL practice/optimization, Python development, and SQL optimization courses)

Password: hg3h

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

How can I use multi-thread LRU refresh to break through the MySQL bottleneck ?, Lrumysql

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

How can I use multi-thread LRU refresh to break through the MySQL bottleneck ?, Lrumysql

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support