Troubleshoot low Erlang scheduler CPU utilization

Last Update:2014-09-18 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

-Cause

In the last group of services recently, the CPU usage of some node servers is very low, with only 1/4 of other servers. Eliminate service imbalance and suspected it was a system top statistics error. We found it through Erlang: Statistics (scheduler_wall_time) from the Utilization Survey of Erlang scheduler) check that the actual CPU usage of the machine scheduler with low server CPU is close to 100%, and that of other machines is less than 30%.

By analyzing different service services, we found that the CPU utilization of the scheduler is low only when the number of processes in the node is low.

-WhatsApp case

There are not many cases in Erlang. Fortunately, WhatsApp provides a detailed analysis of similar cases:

Http://highscalability.com/blog/2014/2/26/the-whatsapp-architecture-facebook-bought-for-19-billion.html

First bottleneck showed up at 425 K. system ran into a lot of contention. work stopped. instrumented the scheduler to measure how much useful work is being done, or sleeping, or spinning. under load it started to hit sleeping locks so 35-45% CPU was being used into ss the system but the schedulers are at 95% utilization.

WhatsApp encounters a bottleneck when connecting to a single-host 425k. The VM is only 35 ~ 45% CPU causes 95% CPU to the system. No details are mentioned in this Article. About Scheduler: 1. + SWT low set the scheduler wake up threshold to low because schedulers wocould go to sleep and wocould never wake up2. set the process priority to real-time run beam at real-time priority so that other things like Cron jobs don't interrupt schedule. prevents glitches that wocould cause backlogs of important user traffic3. disable spin (patch beam) patch to dial down spin counts so the scheduler wouldn't spin, + SSCT 1 (via patch; scheduler spin count)

-Tool Analysis

Through Weibo's private message, I consulted Zheng siyao and recommended vtune analysis. It is assumed that the scheduler consumes too much for a large number of processes.

Enter the registration information on the intel official website. An email will be sent immediately and a 30-day trial period will be given.

The download speed is very slow. We recommend that you download VPNs. The command line mode of the Linux version of vtune is easy to use:

Tar-zxf vtune_amplifier_xe_2015.tar.gz

CD vtune_amplifier_xe_2015

./Install. Sh

CD/opt/Intel/vtune_amplifier_xe_2015.1.0.367959/

Source amplxe-vars.sh

Amplxe-Cl-collect lightweight-hotspots-run-pass-thru = -- no-altstack-target-pid = 1575

Amplxe-Cl-Report hotspots

You can run the command online without affecting the normal operation of the service. The following result is displayed:

Summary-------Elapsed Time:       19.345CPU Time:           182.023Average CPU Usage:  9.155CPI Rate:           1.501Function                                     Module              CPU Time:Self-------------------------------------------  ------------------  -------------sched_spin_wait                              beam.smp                   72.754raw_local_irq_enable                         vmlinux                    19.282process_main                                 beam.smp                   10.476ethr_native_atomic32_read                    beam.smp                    8.337[email protected]0xffffffff8100af60                      vmlinux                     3.007__pthread_mutex_lock                         libpthread-2.12.so          2.342raw_local_irq_restore                        vmlinux                     1.973__sched_yield                                libc-2.12.so                1.913pthread_mutex_unlock                         libpthread-2.12.so          1.553__audit_syscall_exit                         vmlinux                     1.192system_call                                  vmlinux                     1.156erts_thr_yield                               beam.smp                    1.114handle_delayed_dealloc                       beam.smp                    0.977update                                       beam.smp                    0.828raw_local_irq_enable                         vmlinux                     0.780

We can see that sched_spin_wait occupies 40% of the CPU time.

#define ERTS_SCHED_SPIN_UNTIL_YIELD 1002121 static erts_aint32_t2122 sched_spin_wait(ErtsSchedulerSleepInfo *ssi, int spincount)2123 {2124     int until_yield = ERTS_SCHED_SPIN_UNTIL_YIELD;2125     int sc = spincount;2126     erts_aint32_t flgs;21272128     do {2129     flgs = erts_smp_atomic32_read_acqb(&ssi->flags);2130     if ((flgs & (ERTS_SSI_FLG_SLEEPING|ERTS_SSI_FLG_WAITING))2131         != (ERTS_SSI_FLG_SLEEPING|ERTS_SSI_FLG_WAITING)) {2132         break;2133     }2134     ERTS_SPIN_BODY;2135     if (--until_yield == 0) {2136         until_yield = ERTS_SCHED_SPIN_UNTIL_YIELD;2137         erts_thr_yield();2138     }2139     } while (--sc > 0);2140     return flgs;2141 }

The default value is spincount = 10000, but each time there is an atom read operation, the atomic operation generally takes dozens to hundreds of CPU cycles, resulting in a long wait for the actual execution.

Also find the corresponding Configuration:

Startup parameter: + sbwt none means that spin is completely disabled without having to solve it like WhatsApp patch beam.

Troubleshoot low Erlang scheduler CPU utilization

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Troubleshoot low Erlang scheduler CPU utilization

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Troubleshoot low Erlang scheduler CPU utilization

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support