Disk IO high and thread switching over high performance voltage measurement case studies

Last Update:2017-11-06 Source: Internet

Author: User

Tags switches

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Case phenomenon:

When the pressure test, found a request pressure 80tps, the CPU occupied is very high (24-core machine, each CPU occupation rate of the total soared to more than 80%), and set the checkpoint did not have any error.

1. The top command is as follows:

2, understand the background to implement the logic: The general is such: after the server received the request, will be another KV server request data, take back the data, according to the user's machine code to do personalized operation, and finally return the results to the client, during the output some debug log.

Check the next, the KV server is normal, the description is the problem of the native service server. Specifically, use the Vmstat command to see where the anomaly is.

3, it can be seen intuitively, Bi, Bo, in, CS the values of the four items are very high, according to experience, BI and bo for disk IO-related, in and CS on behalf of the system process-related. One solution, first look at IO.

4, with the iostat–x command to read the disk read and write, sure enough, the disk slowly to block the dead.

5, read the next process, only write log operation can cause frequent read and write disk. Decisively close log. Re-crackdown on the test.

6, Bi and Bo down to normal, indicating that the disk problem solved. But the number of context switches actually reached 400,000 times per second! It's horrible.

7, only know that the number of context switches is very large, how to know which processes to switch between?

a script was searched on the internet, which was used to count the top20 of the process switching in a given time and print it out.

#! /usr/bin/Env stap##GlobalCsw_countGlobalidle_countprobe Scheduler.cpu_off {csw_count[task_prev, Task_next]++Idle_count+=idle}function fmt_task (Task_prev, task_next) {returnsprintf"%s (%d)->%s (%d)", Task_execname (Task_prev), Task_pid (Task_prev), Task_execname (Task_next), Task_pid (Task_next))} function Print_cswtop () {printf ("%45s%10s\n","Context Switch","COUNT")foreach([Task_prev, Task_next]inchCsw_count-limit -) {printf ("%45s%10d\n", Fmt_task (Task_prev, Task_next), Csw_count[task_prev, Task_next])} printf ("%45s%10d\n","Idle", Idle_count) Delete Csw_countdelete idle_count}probe timer.s ($1) {print_cswtop () printf ("--------------------------------------------------------------\ n")}

After saving to CS.STP, execute with STAP CSWMON.STP 5 command.

8, the discovery is the discover process in the repeated and the system process to switch. This consumes a lot of resources.

9, from the online search for some of the ways to reduce the switching process:

The development was then changed: the number of threads was doubled and controlled in a process.

Re-suppressed a bit. The number of context switches was found to be reduced to about 250,000 times.

The performance data at this time can reach about 260 times per second, much higher than the previous 80 times. Have reached the need to go live.

However, due to the high number of page break books and context switches, it is necessary to optimize the following

Disk IO high and thread switching over high performance voltage measurement case studies

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More