從MySQL暫存資料表談到filesort

來源:互聯網
上載者:User

內部暫存資料表的類型和產生時機相關,翻譯自:http://dev.mysql.com/doc/refman/5.6/en/internal-temporary-tables.html

In some cases, the server creates internal temporary tables while processing queries. Such a table can be held in memory and processed by the MEMORY storage engine, or stored on disk and processed by the MyISAM storage engine. The server may create a temporary table initially as an in-memory table, then convert it to an on-disk table if it becomes too large. Users have no direct control over when the server creates an internal temporary table or which storage engine the server uses to manage it.

有時候資料庫伺服器在執行某些查詢的時候會產生內部暫存資料表,這些暫存資料表有可能是產生在記憶體裡的由MEMORY引擎處理的,也有可能是產生在磁碟上由MyISAM引擎處理的。如果說在記憶體中的暫存資料表大小超過限制,伺服器則會將暫存資料表儲存成磁碟暫存資料表。使用者無法直接控制這些內部暫存資料表和管理這些暫存資料表的資料庫引擎。

Temporary tables can be created under conditions such as these:

內部暫存資料表產生的時機有以下幾種:

  • If there is an ORDER BY clause and a different GROUP BY clause, or if the ORDER BY or GROUP BY contains columns from tables other than the first table in the join queue, a temporary table is created.
  • 使用 ORDER BY 子句和一個不一樣的 GROUP BY 子句(經過筆者實驗,應該是GROUP BY一個無索引列,就會產生暫存資料表),或者 ORDER BY 或 GROUP BY 的列不是來自JOIN語句序列的第一個表,就會產生暫存資料表(經筆者實驗,應該是使用JOIN時, GROUP BY 任何列都會產生暫存資料表)
  • DISTINCT combined with ORDER BY may require a temporary table.
  • DISTINCT 和 ORDER BY 一起使用時可能需要暫存資料表(筆者實驗是只要用了DISTINCT(非索引列),都會產生暫存資料表)
  • If you use the SQL_SMALL_RESULT option, MySQL uses an in-memory temporary table, unless the query also contains elements (described later) that require on-disk storage.
  • 用了 SQL_SMALL_RESULT, mysql就會用記憶體暫存資料表。定義:SQL_BIG_RESULT or SQL_SMALL_RESULT can be used with GROUP BY or DISTINCT to tell the optimizer that the result set has many rows or is small, respectively. For SQL_BIG_RESULT, MySQL directly uses disk-based temporary tables if needed, and prefers sorting to using a temporary table with a key on the GROUP BY elements. For SQL_SMALL_RESULT, MySQL uses fast temporary tables to store the resulting table instead of using sorting. This should not normally be needed. 

 

To determine whether a query requires a temporary table, use EXPLAIN and check the Extra column to see whether it says Using temporary. See Section 8.8.1, “Optimizing Queries with EXPLAIN”.

可以用EXPLAIN來查看Extra欄位判斷是否使用了暫存資料表

Some conditions prevent the use of an in-memory temporary table, in which case the server uses an on-disk table instead:

有些情況伺服器會直接使用磁碟暫存資料表

  • Presence of a BLOB or TEXT column in the table

  • 表裡存在BLOB或者TEXT的時候(這是因為MEMORY引擎不支援這兩種資料類型,這裡筆者補充一下,並非只要查詢裡含有BLOB和TEXT類型的列就會產生磁碟暫存資料表,按照高效能MYSQL裡的話,應該這麼說:“Because the Memory storage engine doesn't support the BLOB and TEXT types, queries that use BLOB or TEXT columns and need an implicit temporary table will have to use on-disk MyISAM temporry tables, even for only a few rows.”也就是說如果我們的查詢中包含了BLOB和TEXT的列,而且又需要暫存資料表,這時候暫存資料表就被強制轉成使用磁碟暫存資料表,所以此書一直在提醒我們,如果要對BLOB和TEXT排序,應該使用SUBSTRING(column, length)將這些列截斷變成字串,這樣就可以使用in-memory暫存資料表了)
  • Presence of any column in a GROUP BY or DISTINCT clause larger than 512 bytes

  • GROUP BY 或者 DISTINCT 子句大小超過 512 Bytes
  • Presence of any column larger than 512 bytes in the SELECT list, if UNION or UNION ALL is used

  • 使用了UNION 或 UNION ALL 並且 SELECT 的列裡有超過512 Bytes的列

If an internal temporary table is created initially as an in-memory table but becomes too large, MySQL automatically converts it to an on-disk table. The maximum size for in-memory temporary tables is the minimum of the tmp_table_size and max_heap_table_size values. This differs from MEMORY tables explicitly created with CREATE TABLE: For such tables, the max_heap_table_size system variable determines how large the table is permitted to grow and there is no conversion to on-disk format.

如果內建記憶體暫存資料表建立後變得太大,MySQL會自動將它轉換成磁碟暫存資料表。記憶體暫存資料表的大小取決與 tmp_table_size參數和max_heap_table_size參數的值。用 CREATE TABLE 產生的記憶體暫存資料表的大小取決與 max_heap_table_size來決定是否要將其轉換成磁碟暫存資料表

When the server creates an internal temporary table (either in memory or on disk), it increments theCreated_tmp_tables status variable. If the server creates the table on disk (either initially or by converting an in-memory table) it increments the Created_tmp_disk_tables status variable.

當伺服器產生一個記憶體暫存資料表,Created_tmp_tables狀態變數值會增加,當伺服器建立了一個磁碟暫存資料表時,Created_tmp_disk_tables狀態變數值會增加。(這幾個變數可以通過 show status命令查看得到)

Tips: internal temporaray table 的大小受限制的是tmp_table_size和max_heap_table_size的最小值;而 user-created temporary table的大小隻受限與max_heap_table_size,而與tmp_table_size無關。以下是文檔原文,注意粗體部分

 tmp_table_size

Command-Line Format --tmp_table_size=#
Option-File Format tmp_table_size
Option Sets Variable Yes, tmp_table_size
Variable Name tmp_table_size
Variable Scope Global, Session
Dynamic Variable Yes
  Permitted Values
Type numeric
Default system dependent
Range 1024 .. 4294967295

The maximum size of internal in-memory temporary tables. (The actual limit is determined as the minimum oftmp_table_size and max_heap_table_size.) If an in-memory temporary table exceeds the limit, MySQL automatically converts it to an on-disk MyISAM table. Increase the value of tmp_table_size (andmax_heap_table_size if necessary) if you do many advanced GROUP BY queries and you have lots of memory. This variable does not apply to user-created MEMORY tables.

You can compare the number of internal on-disk temporary tables created to the total number of internal temporary tables created by comparing the values of the Created_tmp_disk_tables andCreated_tmp_tables variables.

See also Section 8.4.3.3, “How MySQL Uses Internal Temporary Tables”.

 max_heap_table_size

Command-Line Format --max_heap_table_size=#
Option-File Format max_heap_table_size
Option Sets Variable Yes, max_heap_table_size
Variable Name max_heap_table_size
Variable Scope Global, Session
Dynamic Variable Yes
  Permitted Values
Platform Bit Size 32
Type numeric
Default 16777216
Range 16384 .. 4294967295
  Permitted Values
Platform Bit Size 64
Type numeric
Default 16777216
Range 16384 .. 1844674407370954752

This variable sets the maximum size to which user-created MEMORY tables are permitted to grow. The value of the variable is used to calculate MEMORY table MAX_ROWS values. Setting this variable has no effect on any existingMEMORY table, unless the table is re-created with a statement such as CREATE TABLE or altered with ALTER TABLE or TRUNCATE TABLE. A server restart also sets the maximum size of existing MEMORY tables to the globalmax_heap_table_size value.

This variable is also used in conjunction with tmp_table_size to limit the size of internal in-memory tables. SeeSection 8.4.3.3, “How MySQL Uses Internal Temporary Tables”.

max_heap_table_size is not replicated. See Section 16.4.1.21, “Replication and MEMORY Tables”, andSection 16.4.1.34, “Replication and Variables”, for more information.

————————————————————————————————————————————————————————————————————

 

下面來說說filesort。什麼是filesort?翻譯一篇來自 Baron Schwartz的blog, 此君是誰?他是 High Performance MySQL的第一作者,呵呵了個呵呵

If you were interviewing to work at Percona, and I asked you “what does Using filesort mean in EXPLAIN,” what would you say?

I have asked this question in a bunch of interviews so far, with smart people, and not one person has gotten it right. So I consider it to be a bad interview question, and I’m going to put the answer here. If anyone gets it wrong from now on, I know they don’t read this blog!

 這段在吹牛比。

The usual answer is something like “rows are being placed into a temporary table which is too big to fit in memory, so it gets sorted on disk.” Unfortunately, this is not the same thing. First of all, this is Using temporary. Secondly, temporary tables may go to disk if they are too big, but EXPLAIN doesn’t show that. (If I interview you, I might ask you what “too big” means, or I might ask you the other reason temporary tables go to disk!)

一般人的回答是: “當行資料太大,導致記憶體無法容下這些資料產生的暫存資料表時,他們就會被放入磁碟中排序。”  很不幸,這個答案是錯的。首先,這個叫做 Using temporary (參見Expain 或者 DESC裡的Extra欄位);第二,暫存資料表在太大的時候確實會到磁碟離去,但是EXPLAIN不會顯示這些。 (Bala bala bala...)

The truth is, filesort is badly named. Anytime a sort can’t be performed from an index, it’s a filesort. It has nothing to do with files. Filesort should be called “sort.” It is quicksort at heart.

那麼事實是, filesort 這個名字取得太搓逼了。 filesort的意思是只要一個排序無法使用索引來排序,就叫filesort。他和file沒半毛錢關係。filesort應該叫做sort。(筆者補充一下:意思是說如果無法用已有index來排序,那麼就需要資料庫伺服器額外的進行資料排序,這樣其實是會增加效能開銷的。)

If the sort is bigger than the sort buffer, it is performed a bit at a time, and then the chunks are merge-sorted to produce the final sorted output. There is a lot more to it than this. I refer you to Sergey Petrunia’s article on How MySQL executes ORDER BY.You can also read about it in our book, but if you read Sergey’s article you won’t need to.

如果說sort資料比sort buffer還大,那麼排序會分解成多部分,每次排序一小部分,最後將各部分合并後輸出(你就是想說合并排序吧)。這裡面還有很多東西可說哦。推薦你去看xxxxx,或者是看我們的書(High Performance MySQL,不得不說確實是好書)。bala bala。。。

OK。現在瞭解神碼是filesort了,實際上確實這玩意和暫存資料表和檔案沒半毛錢關係。大家可以試試用 order by 一個無索引的列,Extra裡就會出現 Using filesort了

。。。額。。。我是為毛把暫存資料表和filesort放在一起來了?

聯繫我們

該頁面正文內容均來源於網絡整理,並不代表阿里雲官方的觀點,該頁面所提到的產品和服務也與阿里云無關,如果該頁面內容對您造成了困擾,歡迎寫郵件給我們,收到郵件我們將在5個工作日內處理。

如果您發現本社區中有涉嫌抄襲的內容,歡迎發送郵件至: info-contact@alibabacloud.com 進行舉報並提供相關證據,工作人員會在 5 個工作天內聯絡您,一經查實,本站將立刻刪除涉嫌侵權內容。

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.