CPU crashes due to many Linux Processes
I. My mailbox received a bunch of monitoring data. The alarm is empty as follows. It is obvious that the CPU is not enough, and I/O is also a problem: Host: bwebser2 _ 10.253.5.198 time: 15:25:17 status: PROBLEM level: Warning alarm reason: Processor load is too high on bwebser2 content: Processor load (1 min average per core): value = 52.53 original event ID: 30605 Host: bwebser2 _ 10.253.5.198 time: 2015.11.18 15:42:23 status: PROBLEM level: Warning alarm reason: Disk I/O is overloaded on bwebser2 content: CPU iowait time: value = 68.7% original event ID: 30812
2. view the process with top and find nearly 2000 processes
[root@bwebser2 ~]# toptop - 10:00:32 up 184 days, 19:55, 2 users, load average: 49.39, 52.06, 53.04Tasks: 1826 total, 1 running, 1825 sleeping, 0 stopped, 0 zombieCpu(s): 22.5%us, 3.8%sy, 0.0%ni, 31.7%id, 41.3%wa, 0.7%hi, 0.0%si, 0.0%stMem: 8058056k total, 7631808k used, 426248k free, 718780k buffersSwap: 0k total, 0k used, 0k free, 358720k cached
3. The prediction may be related to sendmail. Check maillog and keep alarming: No space left on device
[root@bwebser2 ~]# tail -f /var/log/maillog Nov 19 10:12:15 bwebser2 postfix/postdrop[19470]: warning: mail_queue_enter: create file maildrop/878633.19470: No space left on deviceNov 19 10:12:15 bwebser2 postfix/postdrop[27287]: warning: mail_queue_enter: create file maildrop/900082.27287: No space left on deviceNov 19 10:12:15 bwebser2 postfix/postdrop[12347]: warning: mail_queue_enter: create file maildrop/919377.12347: No space left on deviceNov 19 10:12:15 bwebser2 postfix/postdrop[21222]: warning: mail_queue_enter: create file maildrop/937001.21222: No space left on deviceNov 19 10:12:16 bwebser2 postfix/postdrop[25028]: warning: mail_queue_enter: create file maildrop/956095.25028: No space left on deviceNov 19 10:12:16 bwebser2 postfix/postdrop[28123]: warning: mail_queue_enter: create file maildrop/980022.28123: No space left on deviceNov 19 10:12:16 bwebser2 postfix/postdrop[26680]: warning: mail_queue_enter: create file maildrop/999360.26680: No space left on device
4. Use lsof to determine the number of sendmail and postdrop processes. The number of processes reaches more than 2000. Why is there so much?
[root@bwebser2 ~]# lsof |grep sendmail |wc -l24682[root@bwebser2 ~]# lsof |grep postdrop |wc -l24108
5. Check the inode of the file index node and find that the space is full:
[Root @ bwebser2 log] # df-iFilesystem Inodes IUsed IFree IUse % Mounted on/dev/xvda1 1310720 1310720 100% 0 1007257/tmpfs 1007256 1 1% 13107200/dev/shm/dev/xvdb1 6142 13101058 1%/u01 run the df-Th command: root @ cwebser3 statistics] # df-ThFilesystem Type Size Used Avail Use % Mounted on/dev/xvda1 ext4 20G 4.1G 15G 22%/tmpfs 3.9G 0 3.9G 0%/dev /shm/dev/xvdb1 ext3 197G 18G 170G 10%/u01
6. Clear the zookeeper monitoring log to free up the Root Space
cd /home/zookeeper/monitor [root@bwebser2 monitor]# lltotal 8drwxrwxr-x 163 zookeeper zookeeper 4096 Nov 12 00:16 chartsdrwxrwxr-x 167 zookeeper zookeeper 4096 Nov 18 17:31 statistics[root@bwebser2 monitor]# cd chartsrm -rf *[root@bwebser2 monitor]# cd ../statistics/[root@bwebser2 statistics]# rm -rf 201506*[root@bwebser2 statistics]# rm -rf 201507*[root@bwebser2 statistics]# rm -rf 201508*[root@bwebser2 statistics]# rm -rf 201509*[root@bwebser2 statistics]# rm -rf 201510*
7. After killing all sendmail and postdrop Processes
[root@bwebser2 ~]#ps -ef|grep sendmail | grep -v grep | awk '{print "kill -9 " $2}' |sh[root@bwebser2 ~]#ps -ef|grep postdrop | grep -v grep | awk '{print "kill -9 " $2}' |sh
8. View lsof. The number of processes is 0.
[root@bwebser2 ~]# lsof |grep sendmail |wc -l0[root@bwebser2 ~]# lsof |grep postdrop |wc -l0
9. Modify sysstat under/etc/cron. d to be ignored:
[Root @ bwebser2 cron. d] # cd/etc/cron. d/[root @ bwebser2 cron. d] # lltotal 12-rw-r -- r --. 1 root 113 Nov 23 2013 0hourly-rw-r -- r --. 1 root 108 Apr 7 2014 raid-check-rw-r --. 1 root 235 Nov 23 2013 sysstat vi sysstat add &>/dev/null # run system activity accounting tool every 10 minutes */10 ***** root/usr/lib/sa /sa1 1 &>/dev/null # generate a daily summary of process accounting at 23:5353 23 *** root/usr/lib/sa/sa2-A &>/dev/ null
10. Use the top command again to view more than 100 processes. The monitoring alarm disappears and the problem is solved!
[root@bwebser2 cron.d]# service sendmail restartsendmail: unrecognized service[root@cwebser3 cron.d]# toptop - 10:43:12 up 184 days, 20:37, 2 users, load average: 1.03, 1.54, 14.15Tasks: 105 total, 1 running, 104 sleeping, 0 stopped, 0 zombieCpu(s): 43.4%us, 1.3%sy, 0.0%ni, 47.9%id, 7.0%wa, 0.3%hi, 0.0%si, 0.0%stMem: 8058056k total, 6762996k used, 1295060k free, 1422060k buffersSwap: 0k total, 0k used, 0k free, 381392k cached