A dba reported that a large slow log file was deleted in the daily environment before MySQLLogs was deleted (assuming the file size is more than 10 GB ), then execute flush slow logs in MySQL and you will find mysqld hang.
I tried to reproduce this problem today. here I will briefly analyze the cause.
Reproduction steps:
1. construct the slow log (set long_query_time to 0 );
2. observe the tps/qps changes when rm slow log is deleted;
3. observe the changes in tps/qps when the flush slow logs is executed;
4. record the call stack made by pstack during execution of flush slow logs;
The first step is nothing to say.
Step 2: tps/qps remains unchanged.
Step 3: tps/qps instantly drops to 0, as shown below:
Mysql command line shows that the execution time of flush slow logs is about 3 seconds.
Step 4: Let's take a look at the pstack output result and record only the relevant:
Thread 2 is executing the flush slow logs operation. other threads are waiting for the LOCK_log lock.
The reason is actually very simple. when executing the rm slow log operation in shell, the file is not deleted because mysqld still has a file handle to open the file. Execute the flush logs operation. the actual execution is 1) close; 2) open operation (logger. flush_slow_log-> mysql_slow_log.reopen_file). when the close operation is executed, the file system actually deletes the file, and the thread occupies the LOCK_log lock.
During deletion, the system will execute dirty clicks (of course, I constructed this scenario very extreme, basically all the content of slow log files is in the file system cache), which will consume a lot of time, for example, it takes 3 s to execute this statement. During this period, if the statements sent by the connection need to be logged (the server layer log: slow log/binlog/general log has a total of LOCK_log locks), they will be in the waiting state, then the system's external response is hang.
The time required for calling and executing close in flush slow logs is related to the file size and the ratio of dirty pages in the file system cache. for example, I used sysctl vm before executing flush slow logs. drop_caches = 3 clear
If the file system cache is used, the execution time of flush slow logs operations of the same size is 0.42 s, and the corresponding blocking time will be reduced a lot.
You can consider executing the posix_fadvise call on the slow logs file handle to avoid caching large log file content (slow logs do not need to be cached). This is a master's article, refer to posix_fadvise's misunderstanding and improvement measures for clearing the cache.
In addition, peter discussed this issue in. Be careful rotating MySQL logs recommended that you first perform the mv file, then flush logs, and then delete the file, let the real deletion behavior be done by yourself rather than mysqld. Unfortunately, seven years have passed, and the LOCK_log has not completely solved the lock problem.
PS:
At the end of this article, remember to delete a 10 GB file by using the close/rm operation. drop_caches = 3 after the cache is cleared, this operation takes hundreds of milliseconds (200 ms + on my machine ), you need to find out the kernel group's colleagues.