First, the phenomenon of failure
Connect the business side colleague phone, one of the server can not SSH normal connection, also received the downtime SMS alert information. Ping directly the next host address can ping, SSH port connection hint: Ssh_exchange_identification:connection closed by remote host. View the error by admin port login: "Login:failure forking:cannot allocate Memory/etc/initscript:fork:cannot allocate memory". Screenshot below:
Second, fault analysis and treatment
1. Force Reclaim Memory
From the performance point of view can not fork process can not allocate memory, the problem can be directly restarted to solve, but in order to find root cause. Log on to the system from the admin port successfully from the second attempt. After logging in, however, like the PS auxf command does not run for a while, through the top viewing observation found a large number of postdrop processes:
Perform the following three commands to force the reclaim of memory and end the business process, in the hope that this step will enable normal system command operations
Manually forcibly reclaim memory
Echo 3 >/proc/sys/vm/drop_caches
After the execution of the above, the effect is not obvious, forcing the kill off some business processes
Killall Java
Killall Postdrop
With the three commands that are hard to execute, some commands have the output of the results, but the same as the PS auxf command execution is still not the result. A large number of processes were found by looking at the process number under/proc, attempting to output to a file via Ls/proc >/mnt/proc.txt--a test found that the file was not successfully generated.
2. View disk space occupancy
The problem of the process backlog was caused by the fact that memory data could not be released to disk because of disk space full. The disk directory is also viewed through DF here
found that the Var directory is full, by emptying/var/log under the possible large files, such as info, found that the Var directory is still not released. After entering the/var/spool/postfix/directory du maildrop-sh, found no results in half a day.
3, restart or enter a single user
This can be restarted or entered a single user optional attempt, due to disk depletion caused by the problem, the general restart can be normal access to the system. After several times of sync, reboot the host into the system. By/var/spool/postfix/maildrop verification, it is found that the directory occupies about 17G of space--and the same guess.
After the reboot, the sendmail process and triggering postdrop process execution are invoked when the cron task executes, by observing the process carried out by the host. As follows:
Cron-sendmail.png
To view the script content that was executed, no calls to mail-related statements were found.
Chkconfig--list|egrep-i ' sendmail|postfix ' found that the postfix process has been turned off during boot up. To view the crontab configuration, find the following:
[Root@361way.com ~]# Cat/etc/crontab
Shell=/bin/bash
Path=/sbin:/bin:/usr/sbin:/usr/bin
Mailto=root
home=/
# for details, man 4 crontabs
# Example of Job definition:
#.----------------Minute (0-59)
# | .-------------Hour (0-23)
# | | .----------Day of Month (1-31)
# | | | .-------month (1-12) OR jan,feb,mar,apr ...
# | | | | .----Day of Week (0-6) (sunday=0 or 7) or Sun,mon,tue,wed,thu,fri,sat
# | | | | |
# * * * * * user-name command to be executed
Iii. Causes of failure
1. Reason
Through the above analysis, the reason for the failure is: crontab process in execution, every time the execution will be a report to root. The subsequent sendmail process and the postdrop process are invoked, the mail file is stored in the/var/spool/postfix/maildrop directory, and the Logcollect user's crontab task is found, hundreds of crontab tasks exist ---Feel good silly business model. Cause the Maildrop directory file to heap more and more. When the catalog is occupied, the SendMail and Postdrop processes are not released for a long time. The process gets piled up until the resource runs out and the SSH connection is abnormal.
2. Solution method
Disable the crontab mailto feature or the new automatic cleanup/var/spool/postfix/maildrop directory in crontab. Of course, the former is recommended, the crontab to enable this is a waste of resources is a resource---with view of the performance of the root reporting requirements, except. Disable the crontab mailto function steps are:
Vi/etc/crontab, replace ' mailto=root ' with ' mailto= ', then service crond restart. If not crontab-e the first line adds mailto= "".