怎麼利用多線程LRU重新整理突破MySQL瓶頸?,lrumysql
導讀
翻譯團隊:知數堂藏經閣項目 - 天一閣
團隊成員:天一閣-冷鋒、 天一閣-JACK 、天一閣-神諭
譯文校稿:葉師傅
原文出處:《Percona Server 5.7: multi-threaded LRU flushing》,詳見:https://www.percona.com/blog/2016/05/05/percona-server-5-7-multi-threaded-lru-flushing/
原文作者:Laurynas Biveinis and Alexey Stroganov
In this blog post, we’ll discuss how to use multi-threaded LRU flushing to prevent bottlenecks in MySQL.
在這篇文章中,我們會討論怎麼利用多線程LRU重新整理突破MySQL的瓶頸。
In the previous post, we saw that InnoDB 5.7 performs a lot of single-page LRU flushes, which in turn are serialized by the shared doublewrite buffer. Based on our 5.6 experience we have decided to attack the single-page flush issue first.
在之前的文章中(《MySQL 5.7: initial flushing analysis and why Performance Schema data is incomplete》,詳見文末延伸閱讀),我們看到InnoDB 5.7執行大量的單頁LRU重新整理,然而在共用的doublewrite buffer環節卻又變成了串列模式。基於對5.6的經驗判斷,我們決定先挑戰單頁重新整理這個問題。
Let’s start with describing a single-page flush. If the working set of a database instance is bigger than the available buffer pool, existing data pages will have to be evicted or flushed (and then evicted) to make room for queries reading in new pages. InnoDB tries to anticipate this by maintaining a list of free pages per buffer pool instance;
these are the pages that can be immediately used for placing the newly-read data pages. The target length of the free page list is governed by the innodb_lru_scan_depth parameter, and the cleaner threads are tasked with refilling this list by performing LRU batch flushing. If for some reason the free page demand exceeds the cleaner thread flushing capability, the server might find itself with an empty free list. In an attempt to not stall the query thread asking for a free page, it will then execute a single-page LRU flush ( buf_LRU_get_free_block calling buf_flush_single_page_from_LRU in the source code), which is performed in the context of the query thread itself.
首先,我們先描述下單頁重新整理的概念。如果資料庫工作(活躍)資料集大於可用的buffer pool,已經存在的資料頁就要面臨清理或者被刷(接著清理掉),從而為查詢騰出空閑頁。InnoDB嘗試通過維護每個buffer pool執行個體的空閑頁列表來應對。
這些頁面是可以立即用於放置新讀取的資料頁的。它的頁面列表的長度由 innodb_lru_scan_depth參數控制,並且清理線程通過執行LRU批量重新整理來填充此列表。如果由於某種原因空閑頁面的請求超出了清理線程的處理能力,InnoDB需要找到一個閒置列表。為了不阻塞查詢線程請求一個空閑頁,它將執行單頁的LRU重新整理(在buf_LRU_get_free_block中調用buf_flush_single_page_from_LRU),這是在查詢線程本身的上下文中執行的。
The problem with this flushing mode is that it will iterate over the LRU list of a buffer pool instance, while holding the buffer pool mutex in InnoDB (or the finer-grained LRU list mutex in XtraDB). Thus, a server whose cleaner threads are not able to keep up with the LRU flushing demand will have further increased mutex pressure – which can further contribute to the cleaner thread troubles. Finally, once the single-page flusher finds a page to flush it might have trouble in getting a free doublewrite buffer slot (as shown previously). That suggested to us that single-page LRU flushes are never a good idea. The flame graph below demonstrates this:
這種重新整理模式的問題是,它會遍曆各個buffer pools instance的LRU列表,同時持有buffer pool mutex(或者XtraDB中更細粒度的LRU list mutex)。因此,當清理線程跟不上LRU重新整理的需求就會進一步增加互斥量爭用的壓力 —— 這會給清理線程帶來更多麻煩。最後,一旦單頁重新整理找到一個頁可以進行重新整理,它在擷取閒置doublewrite buffer槽(如前所述)也還是會遇到問題。這就告訴我們一個道理,單頁重新整理並不是一個好的解決方案。下面的火焰圖說明了一切:
Note how a big part of the server run time is attributed to a flame rooted at JOIN::optimize, whose run time in turn is almost fully taken by buf_dblwr_write_single_page in two branches.
這裡注意,JOIN::optimize佔據很大一塊時間,其已耗用時間幾乎完全由buf_dblwr_write_single_page在兩個分支中完成。
The easiest way not to avoid a single-page flush is, well, simply not to do it! Wait until a cleaner thread finally provides some free pages for the query thread to use. This is what we did in XtraDB 5.6 with the innodb_empty_free_list_algorithm server option (which has a “backoff” default). This is also present in XtraDB 5.7, and resolves the issues of increased contentions for the buffer pool (LRU list) mutex and doublewrite buffer single-page flush slots. This approach handles the the empty free page list better.
最簡單的避免單頁重新整理的方法就是,別去做它!查詢線程可以耐心等到待清理線程最終提供一些閒置頁。所以我們在XtraDB 5.6中添加了innodb_empty_free_list_algorithm選項(預設是"backoff")。這個參數也存在於XtraDB 5.7,並解決了緩衝池(LRU列表)mutex和doublewrite buffer單頁重新整理槽的爭用問題。這種方法更好地處理了空的空閑頁列表。
Even with this strategy it’s still a bad situation to be in, as it causes query stalls when page cleaner threads aren’t able to keep up with the free page demand. To understand why this happens, let’s look into a simplified scheme of InnoDB 5.7 multi-threaded LRU flushing:
即使採用了這種策略,還有一個糟糕的情景是,當page cleaner線程無法跟上空閑頁面的需求,它會導致查詢阻塞。為了理解為什麼會這樣,讓我們看下InnoDB 5.7中多線程LRU重新整理的簡單結構圖:
The key takeaway from the picture is that LRU batch flushing does not necessarily happen when it’s needed the most. All buffer pool instances have their LRU lists flushed first (for free pages), and flush lists flushed second (for checkpoint age and buffer pool dirty page percentage targets). If the flush list flush is in progress, LRU flushing will have to wait until the next iteration.
Further, all flushing is synchronized once per second-long iteration by the coordinator thread waiting for everything to complete. This one second mark may well become a thirty or more second mark if one of the workers is stalled (with the telltale sign: “InnoDB: page_cleaner: 1000ms intended loop took 49812ms”) in the server error log. So if we have a very hot buffer pool instance, everything else will have to wait for it. And it’s long been known that buffer pool instances are not used uniformly (some are hotter and some are colder).
看,LRU的批量重新整理不一定發生在最需要的時候。所有的buffer pool執行個體首先重新整理LRU列表(為了釋放data page),然後執行flush list重新整理(達到checkpointh age和buffer pool髒頁百分比條件)。如果flush list重新整理正在執行,LRU重新整理將不得不等到下一次重新整理。
此外,所有重新整理都是有協調器線程每秒同步迭代一次,直到同步完成。在error log中,如果一個背景工作執行緒停止了,可能一個1秒的標記會變成30秒或更多(比如提示:"InnoDB: page_cleaner: 1000ms intended loop took 49812ms")。所以如果我們有一個繁忙的buffer pool執行個體,所有都必須等待。而根據老司機的經驗,緩衝池執行個體並不總是一成不變的(有時忙,有時閑)。
A fix should:
Decouple the “LRU list flushing” from “flush list flushing” so that the two can happen in parallel if needed.
Recognize that different buffer pool instances require different amounts of flushing, and remove the synchronization between the instances.
建議做如下修複:
We developed a design based on the above criteria, where each buffer pool instance has its own private LRU flusher thread. That thread monitors the free list length of its instance, flushes, and sleeps until the next free list length check. The sleep time is adjusted depending on the free list length: thus a hot instance LRU flusher may not sleep at all in order to keep up with the demand, while a cold instance flusher might only wake up once per second for a short check.
綜上,我們為每個buffer pool執行個體分配一個獨立的LRU重新整理線程,該線程監視執行個體中的空閑列表長度,重新整理,然後sleep等待下一個空閑列表長度檢查。休眠時間根據空閑列表長度進行調整:因此,為了跟上需求,一個繁忙的buffer pool執行個體可能根本不會休眠,而比較閑的buffer pool執行個體重新整理可能每秒鐘喚醒一次,進行一次短暫的檢查。
The LRU flushing scheme now looks as follows:
現在,LRU重新整理結構如下:
This has been implemented in the Percona Server 5.7.10-3 RC release, and this design the simplified the code as well. LRU flushing heuristics are simple, and any LRU flushing is now removed from the legacy cleaner coordinator/worker threads – enabling more efficient flush list flushing as well. LRU flusher threads are the only threads that can flush a given buffer pool instance, enabling further simplification: for example, InnoDB recovery writer threads simply disappear.
這在Percona Server 5.7.10-3 RC版本已經實現,這個設計也能簡化代碼。LRU的啟發學習法重新整理設計比較簡單,現在任何的LRU重新整理都從以前的清理coordinator/worker線程中移除——實現了更有效flush list重新整理。LRU重新整理線程是可以重新整理給定buffer pool的唯一線程,還可以進一步簡化:比如,取消InnoDB 恢複寫入線程。
Are we done then? No. With the single-page flushes and single-page flush doublewrite bottleneck gone, we hit the doublewrite buffer again. We’ll cover that in the next post.
那我們就大功告成了?No!隨著單頁重新整理和單頁重新整理doublewrite瓶頸的消失,我們將再次轉向doublewrite buffer。下篇文章見,拜拜!
延伸閱讀:(天一閣@知數堂同一系列譯文)
1、Percona Server 5.7有哪些效能提升?
2、P_S資料不完整原因詳析
3、Percona Server 5.7 並行doublewrite 特性
掃碼加入知數堂技術交流QQ群
(群號:579036588)
群內可@各位助教瞭解更多課程資訊
知數堂
葉金榮與吳炳錫聯合打造
領跑IT精英培訓
行業資深專家強強聯合,傾心定製
MySQL實戰/MySQL最佳化 / Python/ SQL最佳化
數門精品課程
緊隨技術發展趨勢,定期最佳化培訓教案
融入大量生產案例,貼合企業一線需求
社群陪伴學習,一次報名,可學3期
DBA、開發工程師必修課
上千位學員已華麗轉身,薪資翻番,職位提升
改變已悄然發生,你還在等什嗎?
掃碼下載知數堂精品課程試聽視頻
或點擊“閱讀原文”直達
(MySQL 實戰/最佳化、Python開發,及SQL最佳化等課程)
密碼:hg3h