One production environmentWebService migration fault summary taken from the internal BBS platform for Old Boys Training]
This article is a production case for students who are personally involved in helping students solve the problem. It is worth learning from and learning from.
1) Learn From troubleshooting when an emergency occurs.<=The student was very attentive, so meticulous observation was rare.
2) Summary and reflection habits of students after solving problems. <= The student can always sum up his habits.
These two ideas are three of the six core concepts of linux training for old boys. They emphasize thinking, habits, and summary.
The student can sum up and share his experiences with other senior brothers and younger siblings.
These three rare items are all essential to individual achievement. If they are available, they must be excellent even if they are in other occupations.
Fault Cause:
The company's servers should be rectified, ready to use a new server to install the LAMP environment, undertake the current WEB servers, and then re-plan the existing servers;
Fault symptom:
When the code is synchronized to the new server using rsync, the apache service is enabled, and DNS resolution is completed, the access is OK at the beginning! However, in less than one minute, the website was opened very slowly, and sometimes the homepage was opened for a few minutes, which was almost impossible to access). Check the Apache Error Log and there was no obvious error exception log;
Solution Process:
1. First, old man! The instructor asked me to switch the DNS to the new server first and bring in the traffic because it was not the peak access time, and temporary website fluctuations allowed );
2. The old boy asked the teacher to install firbug and other tools to check which part of the website is loading slowly;
3. One minute later, website access began to slow down. According to the second point, when viewing the loading time of the website homepage, it was found that the loading time of the first page on the homepage was more than 20 s;
4. The old man and the teacher use wget to access the local computer, and then use the wget Intranet address on other servers in the same LAN in the IDC, the results are the same, but they are slow. At this time, the network problem is almost ruled out !)
5. The old boy asked the teacher to create a static page under the root directory of the website. The access results were the same, and the opening time was very slow.) The reason for the slow database connection was almost ruled out, the problem is probably caused by apache;
6. At this time, the old-boy teacher first stop Apache and then start, so that the website access speed has not changed;
7. The old boy first stops the website, then uses the killall command to kill all httpd processes, and then starts. At this time, the access speed suddenly becomes faster and the access will be slow one minute later;
8. Compare steps 6 and 7 to find that there is one more process to kill httpd, And the access speed can be restored, the difference between the two steps is that the sixth step retains the access that has already been established, and the seventh step is that all access links are lost, the old boy said that there may be too many connections );
9. Check the number of concurrent connections at this time, which is about 600, and then ask me what mode apache was running in ipvk mode );
10. Open the apache configuration file and find the parameter settings of the perfork module:
<IfModule mpm_prefork_module>
StartServers10
MinSpareServers20
MaxSpareServers30
ServerLimit5000
MaxClients5000
MaxRequestsPerChild 10000
</IfModule>
However, there is no problem with the parameter settings. When the old boy opened the apache Error Log and checked the error information, he found that there was a hidden problem similar to the number of connections; at that time, the records were cleared)
11. The old man-child teacher confirmed that the parameter settings in the ikek mode were incorrect and did not take effect at all, resulting in the use of the default 256, however, the number of concurrent connections is already greater than 256, so the subsequent access users are in the waiting status, and the access will be particularly slow one minute later;
12. According to the previous standard documents, the old male 0-child teacher saw that the ServerLimit location was configured in the document as the first line, but now the server is configured in the fourth line, put the ServerLimit Parameter Line in the first line, and then turn apache off and restart;
13. When I access the website again at this time, everything is normal and the problem is solved !!!!!
According to the process in which the old boy teacher solved the problem. I summarized the ideas of the old boy teacher in the result:
1. When a problem occurs, first analyze the fault and define a wide range of problems. When the server is under low pressure, access is slow, the old boy confirmed that it was a service configuration problem and had little to do with the server hardware !)
2. Locate the problem and troubleshoot the problem one by one, gradually reducing the cause of the problem to server hardware, network factors, and service configuration ):
2. 1. Run the server TOP Command -- à to determine whether the server hardware is used;
2. 2. Access via intranet --- à exclude network Factors
2. 3. Create a static page to troubleshoot configuration problems of services other than apache.
2. 4. Check the apache configuration file -- à to locate the problem and solve the problem.
PS: This time old | I witnessed the troubleshooting process of the boy's teacher. The old boy's teacher was very purposeful in every step, in addition, the old boy always told us that thinking is more important than learning something. He did not understand it in the past, but this time I watched the old boy's teacher solve the fault. In addition to envy, I deeply realized the importance of doing one thing !! With the idea, you can think about the problem at a macro level. The solution and steps for all faults will be clear !!
This article is from the "old boy linux O & M" blog. For more information, contact the author!