background
Through performance monitoring, it is found that a core occupancy rate of the on-line server CPU has reached 100%, and is caused by one of our key services. Fortunately, because our service processes are assumed by multiple identical worker (thread) schedules, there is no impact on services other than high CPU usage. With the last time we found the one to eat IO criminals, this time we want to hunt is lurking in the group of agents, more thrilling yo!
System Environment
With the top command It is easy to locate who occupies the highest CPU.
With our business process (Imdevserver) For example, why is it that the goods are a sleeper? Because this is a multi-threaded process, we need to know that the smallest unit that actually consumes the CPU is the thread, so it must be one or several of the threads that consume too much CPU. The top-h-P PID command looks at the percentage of CPU consumed by each thread in the process.
As shown, we can see that the thread with ID 8863 has the highest CPU usage. OK, we can only find the CPU he stole now, although the boy's mouth is strict, but we have a perfect interrogation process, not afraid of him not to recruit. The first thing to do is strace-t-r-c-P PID command
Its role is to look at the system calls and spend the time, epoll_wait although the call time to occupy a lot, but he is a normal blocking call. We then let pstack pid Go
You can see the call stack for each thread, find the one that has the highest CPU that has been identified, and look at his call stack, and it's easy to see which step logically led to the busy loop, and then use trace-p tid to look at the thread's call process and navigate to the code, Fix the bug and retrieve the stolen CPU.
Location analysis of high CPU utilization at one server