One-time troubleshooting of traffic anomalies between apps and DB

Source: Internet
Author: User

# # #情景
With Zabbix monitoring, there is an irregular flow of traffic between an application and a database (and not frequent), specifically, the incoming traffic to the application server and the outbound traffic to the database server will have a short time (usually about a minute or so) of the surge, or even reach the transmission limit of gigabit NICs.
# # #分析过程
Through the symptoms, it is almost certain that some SQL statements require a large amount of data to be returned. But the analysis of this problem is really not smooth (among them the reason of my earlier method improper).

Initial analysis : Because before that, a log analysis system for fault/anomaly analysis was done (see), and the two-time exception URI was positioned as the problem. So inertial thinking or use this set of things to the flow of abnormal time period of the request, according to the number of occurrences or response time or response to the size of the order, observation. Toss a pass, but this method is not the most direct analysis of the problem of the method, the natural analysis of what results.
During this time I also thought of the tcpdump grab bag, but there are two problems: because the abnormal times are not fixed, inappropriate methods will cause the file size of the packet capture is very large. There was no good way to catch the bag, so I didn't delve into the idea.

Re-analysis : It is the most straightforward way to think about or grab a bag. Then wrote a script, on the application server every 5s to calculate the previous 5s of the incoming traffic bandwidth, based on the application and the normal traffic between the database set a threshold value, more than the secondary threshold trigger packet capture action. This time is able to see some abnormal traffic, but because the mechanism of this script is to determine the traffic and then trigger the packet, so the birth defect is unable to catch the SQL that caused the traffic anomaly.

The final analysis , the brain after a night in the background to run, and finally think of a "perfect" grasp the package method, the general idea is as follows:

Start grabbing the packet and write the result to the file via the-w parameter of tcpdump. Then in the dead loop every 5s calculates the average flow of the first 5s, compared with the threshold value:
If it is less than the threshold, then kill the previous tcpdump, restart a tcpdump, the packet capture data is also written to the previous file (will overwrite the previous content, to solve the problem of increasing the size of the packet capture);
If it is greater than the threshold, then continue to grab the packet, the next cycle to detect traffic, if the next 5s cycle traffic down to the threshold, then kill the tcpdump process, while the capture file is renamed (this is a valid packet capture data can be used to analyze), and then open the new tcpdump repeat before the action.

Take a look at the specific script (this script is the key to the whole process)

  #!/bin/sh#by ljk 2017/03/18# A reasonable grab packet for irregular network traffic anomalies file=/proc/net/devi=0 #用来标记是否出现了流量异常以及异常持续了几个检测周期cd/ Usr/local/src/mysql-sniffer/logs/query/alias capture= ' tcpdump-nnvv port 3306-w tcpdump.txt &>/dev/null ' ( Capture) & #放到后台执行 without blocking the logic of the main process while True;do rx_bytes= ' cat $file |grep eth0|sed ' s/^ *//g ' |awk-f ' [:]+ ' {print $ 2} ' sleep #间隔10s rx_bytes_later= ' cat $file |grep eth0|sed ' s/^ *//g ' |awk-f ' [:]+ ' {print $} ' speed_rx= ' echo "scale=0; ($RX _bytes_later-$RX _bytes) *8/1024/1024/10" |BC ' tcpdump_pid= ' ps-ef|grep tcpdump|grep-v grep|awk ' {pri            NT $ "If [$speed _rx-lt];then #我的阈值是15Mb kill-9 $tcpdump _pid if [$i-gt 0];then    MV Tcpdump.txt "$i" _tcpdump_ ' date + "%f_%h:%m:%s" ' Fi (Capture) & i=0 Else i=$ (($i + 1)) Fidone  

Put the script in the background to run, and then I easily crawled to the exception period of the packet, and then through the Wireshark analysis, but also quickly found the problem of SQL, as expected, because the user input validation is not rigorous, Causes a null string to appear in the Where condition, which in turn requires the data to be returned near the full table (true pits).

One-time troubleshooting of traffic anomalies between apps and DB

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.