MYSQL Order by的實現

最後更新：2018-12-05 來源：互聯網

上載者：User

創建阿里雲帳戶，並獲得超過 40 款產品的免費試用版；而企業帳戶則可以享有總值 $1200 的免費試用版。立即註冊！

本文使用的mysql版本為5.1.48

select sql_no_cache * from one where id< 20 order by id;（id no key）

#0 my_qsort2 (base_ptr=0x1b2176c0, count=4, size=8, cmp=0x869220<ptr_compare_1>,cmp_argument=0x4670ea38) at mf_qsort.c:115#1 0x00000000008690ea in my_string_ptr_sort (base=0x1b2176c0"Xw!\033", items=4, size=5)at mf_sort.c:37#2 0x00000000005c12bb in filesort (thd=0x1b1ef980, table=0x1b23a880,sortorder=0x1b23ccf0,   s_length=1,select=0x1b23cb50, max_rows=18446744073709551615, sort_positions=false,     examined_rows=0x4670ef18) atfilesort.cc:1035#3 0x000000000053c648 in create_sort_index (thd=0x1b1ef980, join=0x1b2198e0,order=0x1b23bfa0,filesort_limit=18446744073709551615, select_limit=18446744073709551615,is_order_by=false)    atsql_select.cc:13832#4 0x000000000054846a in JOIN::exec (this=0x1b2198e0) at sql_select.cc:2257#5 0x00000000005492be in mysql_select (thd=0x1b1ef980,rref_pointer_array=0x1b1f1a68,   tables=0x1b23b868, wild_num=1, fields=<value optimized out>,conds=0x1b23bd30, og_num=1,order=0x1b23bfa0, group=0x0, having=0x0,proc_param=0x0, select_options=2147764736, result=0x1b23c070, unit=0x1b1f1470,select_lex=0x1b1f1898) at sql_select.cc:2509#6 0x00000000005497e9 in handle_select (thd=0x1b1ef980, lex=0x1b1f13d0,result=0x1b23c070,    setup_tables_done_option=0) at sql_select.cc:269#7 0x00000000004c97b7 in execute_sqlcom_select (thd=0x1b1ef980,all_tables=0x1b23b868)   at sql_parse.cc:5075#8 0x00000000004d516a in mysql_execute_command (thd=0x1b1ef980) atsql_parse.cc:2271#9 0x00000000004d5dc4 in mysql_parse (thd=0x1b1ef980, inBuf=0x1b23b650 "select sql_no_cache * from one where id < 20order by id", length=56,found_semicolon=0x46710c30)at sql_parse.cc:5994

通過該bt我們可以知道，filesort是在JOIN的過程中，但又在do_select之前[對於不使用temptabel]，使用qsort排序演算法。

1. 單表的order by

在JOIN::exec的create_sort_index中進行排序操作，首先它判斷是否能夠直接通過index獲得所需的有序資料(create_sort_index)；如果不行的話則進行filesort，即真正的排序過程，在這個函數裡，首先調用find_all_keys來獲得所有滿足條件的資料（通過create_sort_index來test cond），獲得的資料首先儲存在sort_keys裡，這個大小受系統內容變數sort_buff_size的影響，當要排序的資料大於該buff的時候，它們會被write_keys寫到tempfile（一個io_cache結構並不一定是一個實體檔案），此時它們會先被排完序，即每個io_cache裡的資料是有序的。當所有的資料都能夠儲存到一個sort_keys裡（即sort_buff_size足夠大），那麼它們就會在find_all_keys結束後，由filesort調用save_index進行排序，如果是前一種情況（被分割到多個io_cache）則調用merge_many_buff來完成最終的歸併。真正的排序演算法（函數）是由my_string_ptr_sort實現的，它又有兩種策略：當size
<= 20 && items >= 1000 && items <100000（每條記錄的大小小於20Byte，並且記錄資料[1000，100000]）的使用radixsort_for_str_ptr【Radixsort for pointers to fixed length strings. A very quick sort for not to long (< 20char) strings. Neads a extra buffers ofnumber_of_elements pointers
but is 2-3 times faster than quicksort】這是源碼中給的注釋。
對於其它的情況則使用my_qsort2演算法，該演算法在內容會進行一個判斷，當記錄數小於THRESHOLD_FOR_INSERT_SORT（預設為10）時使用插入排序，否則就是快速排序。
上面就是排序的整個過程，它們最終被儲存到table->sort. record_pointers或者sort.io_cache裡。排完序後就是輸出。

#0 evaluate_join_record (join=0x1b214c10, join_tab=0x1b23cb78, error=0) atsql_select.cc:11414#1 0x000000000052dc39 in sub_select (join=0x1b214c10, join_tab=0x1b23cb78,end_of_records=<value optimized out>) at sql_select.cc:11384#2 0x0000000000542f8a in do_select (join=0x1b214c10, fields=0x1b1f19a0,table=0x0, procedure=0x0) at sql_select.cc:11140#3 0x0000000000548613 in JOIN::exec (this=0x1b214c10) at sql_select.cc:2314#4 0x00000000005492be in mysql_select (thd=0x1b1ef980,rref_pointer_array=0x1b1f1a68, tables=0x1b2146e0, wild_num=1, fields=<valueoptimized out>, conds=0x0, og_num=1, order=0x1b214b20, group=0x0,having=0x0, proc_param=0x0, select_options=2147764736, result=0x1b214bf0,unit=0x1b1f1470, select_lex=0x1b1f1898) at sql_select.cc:2509

從該bt可以看出在order by之後(2257)2314，調用了我們在MYSQLJOIN過程裡分析的do_select操作，此時只有一層的sub_select,evaluate_join_record，因為這裡是單表。這裡sub_select獲得的資料不再是從plugin裡去讀而是調用rr_unpack_from_buffer；rr_unpack_from_tempfile（Read aresult set record from a buffer after sorting）從record_pointers或io_cache中讀，而且在evaluate_join_record裡也不會再去test(select_cond->val_int())，因為select_cond，現在被置為空白[這些操作都是在filesort之後賦值的]，最後evaluate_join_record調用end_send發送給用戶端。

2. 兩個表的join，並且order by為驅動表

explain select * from titles t joinemployees e on t.emp_no=e.emp_no where t.title='Engineer' order bye.first_name;

對於這種情況，其實進行的操作跟一個表的是一樣的，只是最後在send data的執行do_select的時候（sql_select.cc:2314），此時就是真正的兩個表進行join了，也就是它是兩層的sub_select----evaluate_join_record。只是對於驅動表在sub_select的時候是從sort.record_pointers或者sort.io_cache獲得資料的，其它的就與我們介紹MYSQL JOIN過程的一樣了。

3. 兩個表的join，並且order by為非驅動表

explain select * from titles t joinemployees e on t.emp_no=e.emp_no where t.title='Engineer' order by t.from_date;

與相比這裡多了一個using temporary[網上有文章說為什麼這個顯示在第一行，其實它顯示在哪一行都不合適，因為using temporary，using filesort都是在兩者join之後]，也就是說以對於這種情況需要一個temporary table來輔助實現，下面我們來分析一下這個temp tab是什麼作用。這裡的代碼如下：

#0 sub_select (join=0x1b2270a0, join_tab=0x1b21a618,end_of_records=<value optimized out>)   at sql_select.cc:11401#1 0x000000000052da0e in evaluate_join_record (join=0x1b2270a0,join_tab=0x1b21a3c0, error=<value optimized out>) at sql_select.cc:11511#2 0x000000000052dc39 in sub_select (join=0x1b2270a0, join_tab=0x1b21a3c0, end_of_records=<value optimized out>) at sql_select.cc:11384#3 0x0000000000542f8a in do_select (join=0x1b2270a0, fields=0x0,table=0x1b23d670, procedure=0x0)    atsql_select.cc:11140#4 0x0000000000546e8c in JOIN::exec (this=0x1b2270a0) at sql_select.cc:1907#5 0x00000000005492be in mysql_select (thd=0x1b1ef980,rref_pointer_array=0x1b1f1a68,   tables=0x1b23b900, wild_num=1, fields=<value optimized out>,conds=0x1b23c978, og_num=1,order=0x1b23cc18, group=0x0, having=0x0,proc_param=0x0, select_options=2147764736, result=0x1b23ccf0, unit=0x1b1f1470,select_lex=0x1b1f1898) at sql_select.cc:2509

可以看出這裡進行do_select(1907)操作是在2257:create_sort_index()之前，這與我們之前的join文件剖析的過程不一樣，即join的過程提前了，因為在前面有一個if (need_tmp)判斷，這個need_tmp就是為order或group使用。並且當join到一條滿足條件的語句時執行的join_tab->next_select為：end_write[兩層evaluate_join_recordtest(select_cond->val_int())後]。而這時寫的位置為join->tmp_table；它的初始化為do_select的開頭，即把進入do_select時指定的need_tmp的temp_table賦值給它。即把join後的所有記錄儲存到一個temp
table裡[join->tmp_table]。
接下來執行change_to_use_tmp_fields，就是讓order by使用temp fields。並把temp table的資訊拷貝到join裡。
再執行create_sort_index，執行sort操作，因為現在傳給filesort的select為空白，所以在find_all_keys的時候不會再進行cond判斷。（即select->skip_record不會被執行）最後再send data，此時還會執行一次do_select【如果沒有tmp_table的話就會只執行一次這個do_select，如單個表的order by】，此時join只有一個temp table，所以這裡只有一層sub_select與evaluate_join_record。最後的join_tab->next_select為end_send。

4. 為什麼MYSQL認為使用temp table會更優？

首先我們看幾個資料：
select count(*) from employees：300024
select count(*) from titles：443308
select count(*) from titles wheretitle='Engineer'：115003
e表比t表小，所以它全表掃描的時候會快一點，但是t表有一個where條件經過test cond之後只剩下115003了，所以按這個資料看MS使用t表去join e表可能更快。而且前者還使用了temp table，實在是想不明白為什麼MYSQL會這樣做，所以就通過如下實驗對比一下兩者的效能。

A:explain select sql_no_cache * from titlest JOIN employees e on t.emp_no=e.emp_no wheret.title='Engineer' order by t.from_date;

執行結果：115003 rows in set (3.09 sec)

B:explain select sql_no_cache * from titlest STRAIGHT_JOIN employees e on t.emp_no=e.emp_nowhere t.title='Engineer' order by t.from_date;

執行結果：115003 rows in set (1.40 sec)
實驗結果如同我們上面分析的：使用t作為驅動表比使用e（MYSQL預設的最佳化結果）效能更好。
其實這個對於不使用order by的情況應該是一樣的，所以下面我們又實驗了這兩種情況：
select sql_no_cache * from titles t JOIN employees e on t.emp_no=e.emp_no wheret.title='Engineer';
115003 rows in set (2.50 sec)
select sql_no_cache * from titles t STRAIGHT_JOIN employees e on t.emp_no=e.emp_no wheret.title='Engineer';
執行結果：115003 rows in set (1.05 sec)

通過上面的實驗我們可以看到MYSQL的最佳化其實並不一定是可靠的，一個原因可能是它無法去判斷where cond後的資料資訊。而這些資訊對於join操作是可能有較大的影響。

5. 總結

整個過程可以簡單的描述為如果能夠對單個表進行order的話就直接先對它的過濾集進行order by操作。如果不能，即會在join後才進行order by，此時會產生一個中間的temporary table做為最後的order by對象（也可以這樣說order by只會對一個表進行操作，只是這個表可能就是本身存在的一個實體表或者是由多個表join後產生的一個暫存資料表）。另外排序演算法還有兩種選擇策略：一種是只取出要order的欄位與一個可以返回一個記錄的指標，對這個把它們放到sort_buff中排序，排完後再按順序取出每個記錄的相應欄位；另一種是把order欄位以及所有要返回的欄位都一次性取出。這兩種策略的選擇是由系統內容變數max_length_for_sort_data（get_addon_fields）決定的，當要取得的欄位類型總和大於max_length_for_sort_data時就使用第一種，否則使用第二種。顯然第一種需要多一次IO，但記憶體使用量更少。所以當order
by效能差的時候可以考慮適當的提高max_length_for_sort_data，或者sort_buffer_size，前者減少IO，後者減少meger。

參考文獻：

簡朝陽：http://isky000.com/database/mysql_order_by_implement

http://www.cnblogs.com/phper7/archive/2010/05/26/1744062.html

本文章原先以中文撰寫並發佈於 aliyun.com，亦設英文版本，僅作資訊用途。本網站不對文章的準確性，完整性或可靠性或其任何翻譯作出任何明示或暗示的陳述或保證。如對該文章有任何疑慮或投訴，請傳送電郵至 info-contact@alibabacloud.com 並提供相關疑慮或投訴的詳細說明。職員會於 5 個工作天內與您聯絡，一經驗證之後，即會刪除該侵權內容。

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More