MySQL Common Fault Diagnosis

Last Update:2020-06-02 Source: Internet

Author: User

Keywords mysql fault mysql common fault mysql common fault diagnosis

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

As a mysql fault warning, the problem should be strangled in the cradle as far as possible. When the problem occurs, it should be dealt with in time. Otherwise, when the server is down, then go to the "fire fighting" work, often cannot be diagnosed accurately in the first time. Therefore, a monitoring process should be established to be able to pre-warn the threshold in advance, immediately check the problem, and reduce the risk.

Factors affecting performance
Disk I/O

The most important factor affecting the performance of the mysql innodb engine is disk I/O. The most widely used hard disks are mechanical hard disks. When the mechanical hard disk is in operation, the disk needs to be addressed. The random read and write speed greatly increases the sequential read and write. During each read and write operation, there may be a large offset in the track, and the delay of the disk addressing time will cause the I/O performance to decline sharply. Therefore, in the production environment, RAID5 and RAID10 disk arrays are often used for parallel reading and writing, or SSD solid state drives are used to greatly increase the reading and writing speed.

RAM

The Innodb engine creates a buffer pool innodb buffer pool in memory, reads and writes data and indexes in memory, and the database performance is very high. Therefore, the size of the innodb buffer pool directly affects the performance of mysql. In the production environment, the size of the innodb buffer pool is adjusted to about 80% of the host memory.

Slow SQL

The mysql optimizer does not optimize all SQL, SQL can not reasonably use the index, or a large number of aggregation operations are often responsible for the failure.

System performance evaluation
Factors affecting system performance
CPU performance
Memory performance and size
Disk I/O
System performance evaluation index
CPU performance index
cpu usage
cpu load queue load
cpu idle idleness
cpu wait I/O wait
Memory performance index
used internal memory
Resource cache
File buffer
Free memory
Virtual swap memory
Disk performance index
Disk I/O wait time
I/O queue, the queue is too high, indicating that there is a bottleneck in the disk processing capacity
Average waiting time
Read and write speed per second

Performance monitoring tools
There are many well-known performance viewing tools under Linux. Most tools collect performance by reading various files in the system directory /proc.

The tools I often use are top, sar, dstat, iostat, mpstat, etc. Among them, dstat is rich in functions and supports viewing CPU, memory, disk, swap memory, network, TCP, mysql and other indicators. It is a versatile performance viewing tool.

The dstat tool uses:

# dstat [option] [interval interval] [count times]
-c: view the usage of cpu
-m: check memory
-d: View disk IO read and write status
-n: network access speed
-l: cpu load
-t: View TCP connection statistics
-s: usage of virtual swap memory
-p: view the running status of the process

sar tool, you can view the performance statistics of each time period of the day.

# sar [option] [interval interval] [count times]
-A: View all information summary
-q: cpu load
-p: cpu usage
-r: memory usage
-d: View disk IO read and write status
-n: network traffic

Faults and treatment
Too many connections
There are many reasons for the excessive number of connections. Slow SQL causes subsequent SQL blocking, disk I/O, and the CPU is busy, causing the database to crash.

The most direct way is to increase the size of max_connections and increase the maximum number of database connections. However, maintaining a session connection requires memory. Too many connections, occupying master memory, is a permanent cure.
The Mysql database will automatically close the idle session connection after processing a SQL. The idle session closing time depends on the value of the parameter wait_timeout (unit: seconds). Try not to adjust it too much, causing waste of resources. According to the actual situation of the production environment, the appropriate adjustment is 100-300 seconds.
If the business visits are very large, but sometimes need to restart due to failures. After restarting, the memory cache will be released, so that when a large number of users access, the first query data will be much slower than normal because there is no cache, there will be a peak performance in the short term, and even the number of connections will be full. So how to solve this problem? Is it possible to let mysql automatically load data into memory? Mysql innodb provides a warm-up function to load data into the cache in advance. my.cnf configures the following parameters:
# Export memory cache to disk during shutdown
innodb_buffer_pool_dump_at_shutdown = 1
# Load disk cache into memory at startup
innodb_buffer_pool_dump_at_startup = 1
# Manually export the cache to disk
innodb_buffer_pool_dump_now = 1
# Manually load the cache into memory
innodb_buffer_pool_load_now = 1

When using the Innodb engine, it is also necessary to ensure that the size of innodb_buffer_pool_size is sufficient. The larger the ratio of innodb_buffer_pool_size/data_size, the better. If it is too small, it will increase the disk I/O pressure and reduce the efficiency of SQL execution, resulting in too many connections.
Slow SQL
Slow SQL will take up a lot of system resources, such as the heavy use of subqueries and aggregate functions. The performance optimization will be explained in detail later.

Sudden increase in transaction SQL volume
Sometimes due to a sudden increase in business volume, a transaction caused by an increase in the number of executions of a SQL causes excessive pressure on the database. At this time, you need to analyze the surge SQL by looking at the log to optimize the development.

Insufficient virtual swap memory
Reasonably increase the size of swap memory to avoid crashing when memory is exhausted.

Monitor the memory, be able to trade early warnings, and make treatments.

ibdata data file accidentally deleted
The data file was accidentally deleted due to operational errors. At this time, the mysqld process must not be shut down, the process is killed, the memory cache, and the fd file is released and cannot be recovered.

Find the Mysql process pid
Enter the directory /proc/pid/fd
ls -l |grep ib_, you will see several fd link files, that is, the data files that were deleted before, and are still in memory at this time.
Perform flush tables with read lock in the database to lock the database to prevent data from being written for recovery
Check show master status and wait for the file and pos points to change
Copy the several files from step 3 to the database directory with the cp command
Restart the mysql service, recovery is complete.
Master-slave replication troubleshooting
Errors often occur during master-slave replication. Most people skip directly when facing errors. In fact, they may cause inconsistent master-slave data and cause potential risks for future failovers. It takes too much time to export and import data and re-master-slave synchronization. Therefore, the following is a detailed summary of how to deal with this issue.

Failure to delete data on the master
After deleting the data on the master, the slave reports an error because it cannot find the corresponding record. In this case, the record has been deleted from the master library, so the slave library can also be deleted, and the error can be skipped directly.

stop slave; set global sql_slave_skip_counter=1; start slave;
1
Primary key repeat
The primary key repeats the record to be inserted, which already exists on the secondary library. You can delete this record, export the information from the master library, insert it into the slave library table, and restart the synchronization process.

stop slave; set global sql_slave_skip_counter=1; start slave;

Update record on master, slave cannot find record
Export the information from the master library, insert it into the slave library table, and restart the synchronization process.

stop slave; set global sql_slave_skip_counter=1; start slave;

Slave down, causing relay-log damage and synchronization error
If the slave crashes, it may report an error initializing relay log position: I/O reading...

This is because the relayl-log is damaged. After MySQL 5.5, configure the relay-log-recover=1 parameter to automatically resume log synchronization.

Too much delay behind slave
Sometimes when you check the synchronization status of the slave database, you will find an indicator. The value of seconds_behind_master is large. This value represents the number of seconds behind the master database. It is usually caused by slow SQL and slow transactions. Therefore, avoiding the use of large time-consuming transactions and optimizing SQL in time is a measure to ensure that the slave library is not dragged across.

Slave automatically skips errors
The error codes 1023 (the record does not exist) and 1062 (the primary key is repeated) that often appear during synchronization are irrelevant. Handling each time will be tedious and waste unnecessary time. We can set it to skip automatically.
Add in the configuration file:

# Automatically skip errors
slave_skip_errors=1023,1062

Errors caused by inconsistent master and slave database versions
If the master and slave versions are inconsistent, it may cause synchronization errors. Because the lower version of mysql is not compatible with some syntax or features in the higher version, the master library version is high. When the slave library version is low, the slave library will report an error, and the master library version is low. When the slave library version is high, synchronization is normal. In a production environment, you should keep the database version consistent to avoid increasing hidden troubles.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More