Locating and Analysis of High I/O usage of one server

Source: Internet
Author: User
Tags mysql book

Background: Please leave the case and listen to the feedback from the platform team. When importing some data to the production database, the client access times out, the initial positioning is that the IO usage on the server disk is too high, and the IO will soar to 100% during data import. As a result, many slow query operations on the database lead to client response timeout, instead, I had to temporarily stop the script for importing data and delay the production and testing of this part of data. So I went back to the company the next day and started to track and locate the problem.

Environment Description:

  • Operating System


  • File System


  • Database


First of all, there are more than pieces of data in a maximum table in our database, which can still be processed normally by MySQL. In addition, the client concurrency is only 1 K, and the TPs of the database is less than. I used top successively, And the IO utilization rate monitored by iostat has indeed reached the limit. Finally, I used iotop to find an I/O criminal, jbd2.

Overview
The journaling block device (jbd) provides a filesystem-independent interface for filesystem journaling. ext3, ext4 and ocfs2 are known to use jbd. ocfs2 starting from Linux 2.6.28 and ext4 use a fork of jbd called jbd2.

It is known that its main function is to write logs to the file system. So I/O pressure must be too high due to too frequent operations on the file system. The problem is a system process. Is it a system problem?

Currently, there is only one application on my server that requires a lot of Io operations, that is, the MySQL database. Will it be caused by this? With this question, I used Google to search for MySQL and jbd2 as keywords, and obtained the two MySQL configuration items, sync_binlog and innodb_flush_log_at_trx_commit, as a clue. I suddenly think of something, so I went to chapter 1 of the high-performance MySQL book-the copy chapter (from the above environment description, I can see that I used the master replication of MySQL) the description of sync_binlog is found.

What about innodb_flush_log_at_trx_commit?

  • If innodb_flush_log_at_trx_commit is set to 0, log buffer will write data to log file once per second, and flush (flush to disk) of log file will be performed simultaneously. in this mode, when a transaction is committed, the write operation to the disk is not triggered.

  • If innodb_flush_log_at_trx_commit is set to 1, MySQL writes the data in log buffer to log file and flush (flush to disk) each time a transaction is committed.

  • If innodb_flush_log_at_trx_commit is set to 2, MySQL writes the data in log buffer to log file every time a transaction is committed. However, flush (flush to disk) operations are not performed at the same time. In this mode, MySQL performs the flush (flush to disk) operation once per second.

Due to the characteristics of our business data, the data reliability is not as high as that of the financial and order systems, so we set sync_binlog to refresh the disk every 500 times, innodb_flush_log_at_trx_commit is set to 2, and iotop and other tools are used to view the system I/O status, which is greatly reduced. Well, the criminal who used the knife to kill Io was finally found and handled.

Note: There are two episodes in this process.

  1. A leader found several people for consultation on this issue. He guessed that the server resources were insufficient, the scripts were problematic, and the efficiency of the database was lower... I personally am very disgusted with the practice of making guesses without passing through performance and monitoring and data analysis. I hope to remind all people who want to locate problems through imagination. Please do not set questions that you do not know at will, because others will not regard you as a master, but will only turn a blind eye to you.

  2. After finding out the jbd2 problem, we can see some forum solutions that it is correct to upgrade the system kernel or modify the Kernel configuration items because of Linux kernel bugs, however, even if this problem can be solved, it will be very costly for me. I hope that when you encounter problems, you can use network resources and further analyze your own situations, and then select the solution that suits you.

References:
Http://unix.stackexchange.com/questions/86875/determining-specific-file-responsible-for-high-i-o
Http://serverfault.com/questions/363355/io-wait-causing-so-much-slowdown-ext4-jdb2-at-99-io-during-mysql-commit
Http://blog.itpub.net/22664653/viewspace-1063134/

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.