51 ago, a DBA colleague feedback, in the day-to-day environment to delete a large slow log file (assuming the file size above 10G), and then execute flush slow logs in MySQL, will find mysqld live.
Trying to reproduce the problem today, here's a brief analysis of why.
3. Observe the execution of flush slow logs instantaneous, TPS/QPS changes;
4. Record flush slow logs execution, pstack the call stack;
The first step, there is nothing to say.
The second step, Tps/qps no change.
In the third step, you will find that TPS/QPS drops 0 in an instant, as follows:
The MySQL command line found that the flush slow logs execution time was just around 3s.
In the fourth step, we look at the output of the Pstack and record only the relevant:
You will find that http://www.aliyun.com/zixun/aggregation/29914.html ">thread 2 performs the flush slow logs operation, and the other threads are waiting for the lock lock_log.
The reason behind this is actually very simple, when performing RM slow log operations in the shell, the file is not deleted because Mysqld still has a file handle to open the file. Performs the flush logs operation, which actually performs the 1 close;2) Open operation (Logger.flush_slow_log-> mysql_slow_log.reopen_file), when the close operation is executed, The file system actually deletes the file, at which point the thread occupies the Lock_log lock.
The removal will perform a brush dirty (of course I construct this scene very extreme, basically all slow log file contents are in the file system cache), this will be time-consuming, such as I execute this statement consumes 3s. This time period, if the connection sent to the statement needs to log (server layer Log:slow log/binlog/general log a total of lock_log this lock) will be in the waiting state, then the system external response is to hang live.
Flush the time and file size required to invoke the execution of the close in slow logs, and the dirty page proportions of that file in the file system cache, for example, I use slow logs to empty before executing flush sysctl vm.drop_caches=3
File system cache, the same size of the flush slow logs operation time is 0.42s, the corresponding blocking time will be reduced a lot.
Consider executing the posix_fadvise call on the slow logs file handle, which will not cache very large log file content (slow log also has no need to cache), this article is a master of the text, you can refer to the next posix_fadvise clear cache misunderstanding and improvement measures.
In addition, Peter discussed this issue in 07, be careful rotating MySQL logs its recommendations are first MV file, then flush logs, and then perform the delete file operation, so that the real deletion behavior by themselves rather than mysqld complete. More regrettably, seven years later, lock_log the problem of the lock has not been completely resolved.
PS:
Write a memo at the end of the article, delete a 10G size file through the close/rm operation, and after performing sysctl vm.drop_caches=3 emptying the cache, the operation is still in the hundred-millisecond scale (my machine is 200ms+), What's behind it needs to be understood by colleagues in the kernel group.
The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion;
products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the
content of the page makes you feel confusing, please write us an email, we will handle the problem
within 5 days after receiving your email.
If you find any instances of plagiarism from the community, please send an email to:
info-contact@alibabacloud.com
and provide relevant evidence. A staff member will contact you within 5 working days.