This article is to MySQL in the delete table when I/O error causes analysis, first of all to observe the problem phenomenon and then make the related problem investigation, you can conclude: the main thread to get Ibuf (space,page) process and delete operation execution process and do not lock to ensure mutual exclusion, only async i/ o the merge operation after completion is mutually exclusive with the delete operation. All the explanations in this article are described in detail. Apache PHP MySQL
begin!
Problem phenomenon
Recently using Sysbench to test MySQL, because of the long test time, wrote a script in the Prepare->run->cleanup order in the background running. After running the log to find a problem, the MySQL service error log in a number of errors similar to the following information:
[ERROR] Innodb:trying to do I/O to a tablespace which does not exist. I/O type:read, page: [Page id:space=32, page number=57890], I/O length:16384 bytes.
It looks like I am having an I/O error, but the MySQL process does not crash and the Sysbench client has no error.
Discover the problem process
Based on the time history of the error and the point-in-time comparison of the various stages of the script output, the command executed by the script at that time is:
Sysbench--tables=100--table-size=4000000--threads=50--mysql-db=sbtest--time=300 oltp_delete cleanup
Re-execute the use case again and again without the same situation. But this error message can still be found with script execution. The initial suspicion is that the run and cleanup cannot be spaced too long before triggering the problem. Because the time to perform a 100G of data is longer and the cost of reproduction is large, try to reduce the amount of use case data first. The-table-size=4000000 is modified to 2000000, the execution of the script at this time does not trigger the problem, and finally the-table-size=3000000 can be stable trigger and reduce the partial recurrence time. To confirm that the interval is too long to be reproducible, modify the script to sleep 10 seconds between the two stages of run and cleanup, and it does not trigger this error message. Modified to sleep 5 seconds can also trigger, but the number of errors has been reduced.
Problem investigation
Look at the corresponding version mysql5.7.22 code, found that this error only one location: fil0fil.cc file in line No. 5578 fil_io () function. Use GDB debugging directly, add breakpoints at this location, and execute a reproducible script to get the following stacks:
(GDB) bt#0 fil_io (type= ..., Sync=sync@entry=false, page_id= ..., page_size= ..., byte_offset=byte_offset@entry=0, len= 16384, buf=0x7f9ead544000, message=message@entry=0x7f9ea8ce9c78) at mysql-5.7.22/storage/innobase/fil/fil0fil.cc : 5580#1 0x00000000010f99fa in Buf_read_page_low (err=0x7f9ddaffc72c, sync=<optimized out>, Type=0, mode=< Optimized Out>, page_id= ..., page_size= ..., unzip=true) at mysql-5.7.22/storage/innobase/buf/buf0rea.cc:195#2 0x00000000010fc5fa in Buf_read_ibuf_merge_pages (Sync=sync@entry=false, SPACE_IDS=SPACE_IDS@ENTRY=0X7F9DDAFFC7E0, PAGE_NOS=PAGE_NOS@ENTRY=0X7F9DDAFFC7A0, n_stored=2) at mysql-5.7.22/storage/innobase/buf/buf0rea.cc:834#3 0x0000000000f3a86c in Ibuf_merge_pages (N_pages=n_pages@entry=0x7f9ddaffce30, sync=sync@entry=false) at mysql-5.7.22 /storage/innobase/ibuf/ibuf0ibuf.cc:2552#4 0x0000000000f3a94a in Ibuf_merge (Sync=false, Sync=false, n_pages= 0X7F9DDAFFCE30) at Mysql-5.7.22/storage/innobase/ibuf/ibuf0ibuf.cc:2656#5 ibuf_merge_in_backgRound (Full=full@entry=false) at Mysql-5.7.22/storage/innobase/ibuf/ibuf0ibuf.cc:2721#6 0x000000000102bcf4 in srv_ Master_do_active_tasks () at Mysql-5.7.22/storage/innobase/srv/srv0srv.cc:2132#7 Srv_master_thread (arg=< Optimized out>) at Mysql-5.7.22/storage/innobase/srv/srv0srv.cc:2383#8 0x00007fa003eeddc5 in Start_thread () from/ Lib64/libpthread.so.0#9 0x00007fa002aab74d in Clone () from/lib64/libc.so.6
Obviously this is a background thread doing the insert buffer merge operation. At this point, Space->stop_new_ops is found to be true, that is, the space to be processed by the page is being deleted. Why would you want to operate a space that is being deleted? This requires investigating the Insert Buffer feature, the process of insert buffer merge, and the process of deleting the table.
Insert Buffer Background knowledge
Insert buffer is a special data structure (b + tree) that caches changes when the secondary index page is not in the buffer pool, and later merges when the page is loaded into the buffer pool by another read operation. When MySQL first introduced this feature, it could only cache insert operations, so it's called insert buffer, and now these operations can be insert, UPDATE, or DELETE (DML), so it's called change. Buffer (This is still described in insert buffer), but the source code is still marked with IBUF. This feature caches several updates to the same page, merges them into a one-time update operation, reduces IO, and translates random io into sequential io, which avoids the performance loss of random IO and improves the write performance of the database.
Related Insert buffer merge logic
The Insert buffer merge occurs when the buffer page is read into buffer pool. There are several main scenarios where the merge process will occur:
When the page is read into the buffer pool, the ibuf merge is done after the read, and then the page is available;
The merge operation executes as a background task. The innodb_io_capacity parameter sets the maximum number of pages per merge process for InnoDB background tasks;
During a crash recovery, the insert buffer merge for the corresponding page is executed when the index page is read into the buffer pool;
Insert buffer is persistent and system crashes do not cause it to fail. After rebooting, the insert buffer merge operation will return to normal;
You can use-innodb-fast-shutdown = 0 to force a full merge of the ibuf when the server shuts down.
Our question this time is clearly in the second case. The InnoDB main thread (Svr_master_thread) will actively perform the insert buffer merge operation every second. First determine if the server has been active in the past 1s (inserting tuples into the page, row operations on the undo table, etc.), and if so, the maximum number of pages in the merge is 5% of the innodb_io_capacity set. If not, the maximum number of pages for the merge is the value set by Innodb_io_capacity.
The main thread of the InnoDB main thread (svr_master_thread) merge is as follows:
The main thread reads the page number and space number from the leaf node of the ibuf tree and logs it into a two-tuple array (unlocked);
The main thread detects if the space in the two-tuple is in the tablespace cache, if not, the description has been deleted, deleting the corresponding IBUF record;
The main thread determines whether an asynchronous read operation is performed on a space being deleted, if it is, an error, and deletes the corresponding IBUF record, go to procedure 2 to continue the next array element judgment;
If everything is normal, the main thread issues an async IO request, and Async reads the index page that needs to be merge;
I/O handler thread, after receiving the completion of the async I/O, the merge operation;
When the merge is made, call Fil_space_acquire to increment the space->n_pending_ops. Avoid the deletion of the operation concurrency;
Call Fil_space_release to decrement the space->n_pending_ops after execution is complete.
Logic for related Delete tables
Lock the Fil_system->mutex, set sp->stop_new_ops = True, Mark space is being deleted, do not allow new operations on it, and then unlock the Fil_system->mutex;
Lock on Fil_system->mutex, detect space->n_pending_ops, unlock Fil_system->mutex. If the detection is greater than 0, it means that there are still dependent operations not completed, sleep 20ms and retry;
Lock on Fil_system->mutex, detect space->n_pending_flushes and (*node)->n_pending, unlock Fil_system->mutex. If the detection is greater than 0, meaning there is still dependent I/O is not completed, sleep 20ms and retry;
At this point it is considered that there is no conflict operation, brush out all dirty pages or delete all the given table space of the page;
Deletes the record of the specified space from the tablespace cache;
Delete the corresponding data file.
Conclusion of the question
It is clear that the process of getting ibuf (Space,page) from the main thread is not guaranteed to be mutually exclusive with the removal operation, only the merge operation after the completion of async I/O is mutually exclusive to the delete operation. This error message occurs if the background thread starts IBUF merge and has performed the 2nd step of detection, but has not yet performed to step 3rd, when the user thread starts to do the delete table and sets the stop_new_ops tag but has not performed to the 5th step to delete the tablespace cache. The interaction of the two threads is as follows:
In the case of an accident, it is necessary for the thread to execute the delete operation of the corresponding table at the point of interruption. Sure enough we can find the following stacks:
Thread 118 (thread 0x7f9de0111700 (LWP 5234)): #0 0x00007fa003ef1e8e in pthread_cond_broadcast@ @GLIBC_2.3.2 () from/lib64 /libpthread.so.0#1 0x0000000000f82f41 in Broadcast (THIS=0XD452EF8) at mysql-5.7.22/storage/innobase/os/os0event.cc : 184#2 Set (THIS=0XD452EF8) at Mysql-5.7.22/storage/innobase/os/os0event.cc:75#3 Os_event_set (EVENT=0XD452EF8) at Mysql-5.7.22/storage/innobase/os/os0event.cc:483#4 0x00000000010ec8a4 in signal (this=<optimized out>) at Mysql-5.7.22/storage/innobase/include/ut0mutex.ic:105#5 exit (this=<optimized out>) at mysql-5.7.22/storage/ Innobase/include/ib0mutex.h:690#6 exit (this=<optimized out>) at mysql-5.7.22/storage/innobase/include/ Ib0mutex.h:961#7 Buf_flush_yield (bpage=<optimized out>, buf_pool=<optimized out>) at mysql-5.7.22/ Storage/innobase/buf/buf0lru.cc:405#8 Buf_flush_try_yield (processed=<optimized out>, bpage=<optimized out Buf_pool=<optimized out>) at mysql-5.7.22/storage/innobase/buf/buf0Lru.cc:449#9 buf_flush_or_remove_pages (trx=<optimized out>, flush=<optimized Out>, observer=< Optimized out>, id=<optimized out>, buf_pool=<optimized out>) at mysql-5.7.22/storage/innobase/buf/ Buf0lru.cc:632#10 buf_flush_dirty_pages (buf_pool=<optimized out>, id=<optimized Out>, observer=< Optimized out>, flush=<optimized out>, trx=<optimized out>) at mysql-5.7.22/storage/innobase/buf/ Buf0lru.cc:693#11 0x00000000010f6de7 in Buf_lru_remove_pages (trx=0x0, Buf_remove=buf_remove_flush_no_write, id=55, Buf_pool=0x31e55e8) at Mysql-5.7.22/storage/innobase/buf/buf0lru.cc:893#12 buf_lru_flush_or_remove_pages (id= Id@entry=55, Buf_remove=buf_remove@entry=buf_remove_flush_no_write, trx=trx@entry=0x0) at mysql-5.7.22/storage/ Innobase/buf/buf0lru.cc:951#13 0x000000000114e488 in Fil_delete_tablespace (id=id@entry=55, buf_remove=buf_ Remove@entry=buf_remove_flush_no_write) at Mysql-5.7.22/storage/innobase/fil/fil0fil.cc:2800#14 0x0000000000fE77BD in Row_drop_single_table_tablespace (trx=0x0, Is_encrypted=false, Is_temp=false, filepath=0x7f9d7c209f38 "./ Sbtest/sbtest25.ibd ", Tablename=0x7f9d7c209dc8" Sbtest/sbtest25 ", space_id=55) at Mysql-5.7.22/storage/innobase/row /row0mysql.cc:4189#15 row_drop_table_for_mysql (name=name@entry=0x7f9de010e020 "Sbtest/sbtest25", trx=trx@entry= 0x7f9ff9515750, drop_db=<optimized out>, nonatomic=<optimized Out>, Nonatomic@entry=true, handler= handler@entry=0x0) at Mysql-5.7.22/storage/innobase/row/row0mysql.cc:4741#16 0x0000000000f092f3 in Ha_innobase:: Delete_table (this=<optimized out>, name=0x7f9de010f5e0 "./sbtest/sbtest25") at mysql-5.7.22/storage/innobase/ Handler/ha_innodb.cc:12539#17 0x0000000000801a30 in Ha_delete_table (thd=thd@entry=0x7f9d7c1f6910, table_type=table _type@entry=0x2ebd100, path=path@entry=0x7f9de010f5e0 "./sbtest/sbtest25", db=db@entry=0x7f9d7c00e560 "Sbtest", Alias=0x7f9d7c00df98 "Sbtest25", generate_warning=generate_warning@entry=true) at mysql-5.7.22/sql/handler.cc:2586#18 0x0000000000d0a6af in Mysql_rm_table_no_locks (thd=thd@entry=0x7f9d7c1f6910, tables= Tables@entry=0x7f9d7c00dfe0, If_exists=true, Drop_temporary=false, Drop_view=drop_view@entry=false, Dont_log_query =dont_log_query@entry=false) at mysql-5.7.22/sql/sql_table.cc:2546#19 0x0000000000d0ba58 in mysql_rm_table (thd= thd@entry=0x7f9d7c1f6910, TABLES=TABLES@ENTRY=0X7F9D7C00DFE0, if_exists=<optimized out>, drop_temporary=< Optimized out>) at Mysql-5.7.22/sql/sql_table.cc:2196#20 0x0000000000c9d90b in Mysql_execute_command (thd= thd@entry=0x7f9d7c1f6910, first_level=first_level@entry=true) at mysql-5.7.22/sql/sql_parse.cc:3589#21 0x0000000000ca1edd in Mysql_parse (thd=thd@entry=0x7f9d7c1f6910, parser_state=parser_state@entry=0x7f9de01107a0) at Mysql-5.7.22/sql/sql_parse.cc:5582#22 0x0000000000ca2a20 in Dispatch_command (thd=thd@entry=0x7f9d7c1f6910, Com_ Data=com_data@entry=0x7f9de0110e00, Command=com_query) at mysql-5.7.22/sql/sql_parse.cc:1458#23 0x0000000000ca43Do_command (thd=thd@entry=0x7f9d7c1f6910) at mysql-5.7.22/sql/sql_parse.cc:999#24 0x0000000000d5ed00 in Handle_ Connection (arg=arg@entry=0x10b8e910) at mysql-5.7.22/sql/conn_handler/connection_handler_per_thread.cc:300#25 0x0000000001223d74 in Pfs_spawn_thread (ARG=0X10C48F40) at mysql-5.7.22/storage/perfschema/pfs.cc:2190#26 0X00007FA003EEDDC5 in Start_thread () from/lib64/libpthread.so.0#27 0x00007fa002aab74d in Clone () from/lib64/libc.so.6
Solutions
Add a parameter Ignore_missing_space for buf_read_ibuf_merge_pages, Buf_read_page_low, Fil_io. Indicates that the space being deleted is ignored, and the default is False, which is set to True when Ibuf_merge_pages is called. In the Fil_io the error is added to determine if the parameter is true, then the other process is not reported.
Or pass in the iorequest::ignore_missing parameter directly when Buf_read_ibuf_merge_pages calls Buf_read_page_low.
Specific code reference MARIADB commit:8edbb1117a9e1fd81fbd08b8f1d06c72efe38f44
Impact version
View the relevant information, which is introduced when modifying bug#19710564 to delete a tablespace version.
MySQL Community Server 5.7.6 introduced, version 5.7.22 has not been repaired, version 8.0.0 has been repaired.
MariaDB Server 10.2 is affected. MariaDB Server 10.2.9, 10.3.2 fixed
Optimization recommendations
Can optimize performance: In the Buf_read_ibuf_merge_pages record the space ID of the error, the loop when the next page to determine the space ID, if the space ID is the same, Directly delete the corresponding IBUF record (the currently assigned maximum space ID is recorded in the system tablespace, Space ID is 4 bytes, less than 0xfffffff0ul, allocated when reading the values saved by the system tablespace, and then added one, with uniqueness).
End: for the knowledge point I introduced here, writing a little fast, there may be shortcomings, but also hope that many exchanges, hoping to help everyone.
Related articles:
mysql1064 error Causes and solutions
MySQL FAQs and Solutions
Related videos:
Ajax cross-domain solution: JSONP Video Tutorial