-Cause
In the last group of services recently, the CPU usage of some node servers is very low, with only 1/4 of other servers. Eliminate service imbalance and suspected it was a system top statistics error. We found it through Erlang: Statistics (scheduler_wall_time) from the Utilization Survey of Erlang scheduler) check that the actual CPU usage of the machine scheduler with low server CPU is close to 100%, and that of other machines is less than 30%.
By analyzing different service services, we found that the CPU utilization of the scheduler is low only when the number of processes in the node is low.
-WhatsApp case
There are not many cases in Erlang. Fortunately, WhatsApp provides a detailed analysis of similar cases:
Http://highscalability.com/blog/2014/2/26/the-whatsapp-architecture-facebook-bought-for-19-billion.html
First bottleneck showed up at 425 K. system ran into a lot of contention. work stopped. instrumented the scheduler to measure how much useful work is being done, or sleeping, or spinning. under load it started to hit sleeping locks so 35-45% CPU was being used into ss the system but the schedulers are at 95% utilization.
WhatsApp encounters a bottleneck when connecting to a single-host 425k. The VM is only 35 ~ 45% CPU causes 95% CPU to the system. No details are mentioned in this Article. About Scheduler: 1. + SWT low set the scheduler wake up threshold to low because schedulers wocould go to sleep and wocould never wake up2. set the process priority to real-time run beam at real-time priority so that other things like Cron jobs don't interrupt schedule. prevents glitches that wocould cause backlogs of important user traffic3. disable spin (patch beam) patch to dial down spin counts so the scheduler wouldn't spin, + SSCT 1 (via patch; scheduler spin count)
-Tool Analysis
Through Weibo's private message, I consulted Zheng siyao and recommended vtune analysis. It is assumed that the scheduler consumes too much for a large number of processes.
Enter the registration information on the intel official website. An email will be sent immediately and a 30-day trial period will be given.
The download speed is very slow. We recommend that you download VPNs. The command line mode of the Linux version of vtune is easy to use:
Tar-zxf vtune_amplifier_xe_2015.tar.gz
CD vtune_amplifier_xe_2015
./Install. Sh
CD/opt/Intel/vtune_amplifier_xe_2015.1.0.367959/
Source amplxe-vars.sh
Amplxe-Cl-collect lightweight-hotspots-run-pass-thru = -- no-altstack-target-pid = 1575
Amplxe-Cl-Report hotspots
You can run the command online without affecting the normal operation of the service. The following result is displayed:
Summary-------Elapsed Time: 19.345CPU Time: 182.023Average CPU Usage: 9.155CPI Rate: 1.501Function Module CPU Time:Self------------------------------------------- ------------------ -------------sched_spin_wait beam.smp 72.754raw_local_irq_enable vmlinux 19.282process_main beam.smp 10.476ethr_native_atomic32_read beam.smp 8.337[email protected]0xffffffff8100af60 vmlinux 3.007__pthread_mutex_lock libpthread-2.12.so 2.342raw_local_irq_restore vmlinux 1.973__sched_yield libc-2.12.so 1.913pthread_mutex_unlock libpthread-2.12.so 1.553__audit_syscall_exit vmlinux 1.192system_call vmlinux 1.156erts_thr_yield beam.smp 1.114handle_delayed_dealloc beam.smp 0.977update beam.smp 0.828raw_local_irq_enable vmlinux 0.780
We can see that sched_spin_wait occupies 40% of the CPU time.
#define ERTS_SCHED_SPIN_UNTIL_YIELD 1002121 static erts_aint32_t2122 sched_spin_wait(ErtsSchedulerSleepInfo *ssi, int spincount)2123 {2124 int until_yield = ERTS_SCHED_SPIN_UNTIL_YIELD;2125 int sc = spincount;2126 erts_aint32_t flgs;21272128 do {2129 flgs = erts_smp_atomic32_read_acqb(&ssi->flags);2130 if ((flgs & (ERTS_SSI_FLG_SLEEPING|ERTS_SSI_FLG_WAITING))2131 != (ERTS_SSI_FLG_SLEEPING|ERTS_SSI_FLG_WAITING)) {2132 break;2133 }2134 ERTS_SPIN_BODY;2135 if (--until_yield == 0) {2136 until_yield = ERTS_SCHED_SPIN_UNTIL_YIELD;2137 erts_thr_yield();2138 }2139 } while (--sc > 0);2140 return flgs;2141 }
The default value is spincount = 10000, but each time there is an atom read operation, the atomic operation generally takes dozens to hundreds of CPU cycles, resulting in a long wait for the actual execution.
Also find the corresponding Configuration:
+ Sbwt none | very_short | short | medium | long | very_longSet scheduler busy wait threshold. Default is medium. The threshold determines how long schedulers shoshould busy wait when running out of work before going to sleep.
Startup parameter: + sbwt none means that spin is completely disabled without having to solve it like WhatsApp patch beam.
Troubleshoot low Erlang scheduler CPU utilization