Comparison of several MySQL environment problems

Last Update:2016-01-12 Source: Internet

Author: User

Tags mysql import

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Comparison of several MySQL environment problems

Sometimes there are environmental problems. comparison is a good method. If the comparison is proper, you can avoid repeated problems. You can infer some possible situations or problems based on the comparison.

If the comparison is not correct, it is likely that a wrong conclusion is drawn. Let's give a few simple examples today.

MySQL restart comparison

There was a hardware fault on the slave machine before, but fortunately it was a slave machine. The slave machine means that the value has a slave database, but it is found that the slave database on the slave machine is not associated with the master database, it is also chilling, So let's set up the slave database. The result shows that the binlog is not enabled for the master database. In this case, there is no way to do this. Therefore, after evaluation, the same problem was found in another environment, so I applied for a window time to restart. During this period, I also considered optimization for the parameters at the database level. Therefore, restarting involves two environments, one set is 5.5, and the other is 5.6. For the sake of security, 5.6 of mysql also maintains the original configuration of 5.5, and gtid is not enabled. the restart process has no technical knowledge, but some alarms are generated from the database logs after the restart. The alarm information is as follows:

07:42:23 26782 [Warning] Aborted connection 1238 to db: 'unconnected' user: 'unauthenticated 'host: 'gate _ app_4.172 '(Got an error reading communication pack

Ets)

07:42:30 26782 [Warning] Aborted connection 1242 to db: 'unconnected' user: 'unauthenticated 'host: 'gate _ app_131.41' (Got an error reading communication pac

Kets)

This makes us quite surprised. In this case, from the perspective of comparison, there are the following scenarios.

Comparison scenario 1: whether it is 5.5, whether the parameter setting of 5.6 is affected, and whether it is a bug in 5.6, because the problem is that the mysql server of 5.6.

Obviously not, because I didn't modify other parameters for the 5.6 configuration, but only enabled binlog. The original configuration was not changed for the sake of insurance. The changes to the two environments are the same.

Scenario 2: Is There a problem when 5.6 of the environment is restarted.

This can also be ruled out, because both servers are restarted, and the other server has no similar problems.

Scenario 3: Do you need to check whether persistent connections are not released from the application end?

This was also investigated. From the application end, no problems were found, and many environments were involved.

Scenario 4: there have been some recent network changes, and whether they have some impact on dns changes

The system group was also asked for help, but no related logs were found.

Scenario 5: Compare Before and After restart.

Check whether there is a large difference in log information before and after restart. when the log shows several pages of alarm information, it will not go forward more. After reading pages 4 and 5, it will be the alarm information. Which would you like to view the previous log, similar problems have been found before.

Therefore, there is a benchmark for this comparison. if we compare it with other environments, we may also draw some conclusions. However, the comparison before and after the current environment is restarted can better reflect the root cause of the problem, if this problem exists before, it indicates that this is a historical issue and these applications can stop trying to connect.

MySQL import dump

I encountered several problems during front-end MySQL Data migration based on ECS instances.

Because the data volume is small, mysqldump is used for logical export and then the data is imported directly into the target environment. Because it is a new environment, there is no problem with importing some databases. One database always exits automatically during import.

The error message is:

ERROR 2013 (HY000) at line 8441: Lost connection to MySQL server during query

Of course, we have tried several comparison scenarios to solve this problem.

First, the Environment memory is 16 GB, and there are three dump instances, which are 10 Gb, 20 GB, and 30 GB respectively. In order to save trouble, I started three nohup processes for concurrent import, with data in different databases.

Scenario 1: Concurrent import of Three dump instances, import failed

Scenario 2: Serial import also reports an error, and then imports data in serial mode. The import still fails because the logs are also viewed later. If the Import fails, it is not sure that all logs are successfully completed.

Scenario 3: Of course, it is still in the joint debugging phase, so I still have some time to do more tests, and then import 20 GB, and find that it still fails.

Scenario 4: According to the comparison, the 30 GB cannot be imported, but the Import fails when the 30 GB dump is in a table partition.

Scenario 5: Try to import the partition table in the 30g dump separately and find that the problem persists. But fortunately, I started to view the logs and found that it was originally caused by oom-killer. This is closely related to the small amount of memory available, and of course it is also related to swap.

Scenario 6: No swap is configured for these ECs instances. After swap is added, the 10 Gb dump is imported.

Scenario 7: 20 Gbit/s has been imported successfully, but swap usage is around 10 Gbit/s and swap is configured with 16 Gbit/s. Why is it around 10 Gbit/s? This is related to the default swap configuration usage, the default value is 60%, that is, about GB, which is consistent with the 10 Gb in the phenomenon. Why does it consume a large amount of swap? I initially suspect that it is because online import, because the business starts to perform joint debugging and cannot stop the application, so there is a situation where online data import occurs.

Scenario 8: After importing a 30 GB dump, is it still successful? Unfortunately, it still fails because the oom-killer thread is terminated, the swap is completely released, and the usage of swap is reset to 5 MB at once.

Scenario 9: if you try to import 30 GB dump again at this time, there will be no problem. However, due to online import, there will be some lock waits, and the resource consumption is really high enough, and the swap usage reaches 10 Gb.

Scenario 10: dump has been imported successfully. Why is swap not released? One way is to re-mount the swap partition, but it is depressing that the memory is insufficient. The following error is reported.

# Swapoff-

Swapoff:/home/swapfile: swapoff failed: Cannot allocate memory

So what should we do with this change? At present, restarting is the only effective solution. I contacted the application to restart, but it was not coordinated, so it was delayed for several days.

Scenario 11: a few days later, I checked again and found that swap has been automatically reset. The reason for the reset is oom-killer. It seems that a connection is forcibly terminated, triggered oom-killer, and swap was completely released.

In so many seemingly trivial scenarios, there is a problem that the memory is always insufficient. Besides swap, there should be some reasons. The last reason is that buffer_pool_size is too large, originally 16 GB memory, buffer_pool_size was actually set to 24 GB. Why is this low-level error? tracing found that the original template only verifies RedHat, not CentOS, however, centos is installed on this server. Therefore, 24 GB is directly set during parameter initialization. Therefore, this template also has some problems and its validation logic is not rigorous enough.

After a comparison, we find that the comparison can sometimes help us analyze problems, and sometimes mislead us and make everything happen. Of course, if we do one thing, there will be no output or conclusion, it also has no practical significance.

This article permanently updates the link address:

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More