Measure the test taker's knowledge about traffic spikes.

Last Update:2013-11-21 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

From Network Management and O & M

One day, the moon is dark and the cold wind is fierce. Confused, a burst of calls sounded, and the caller told us that the bandwidth of several data centers had soared and needs to be handled immediately. Otherwise, the IDC service provider would need to unplug the network cable. It may be about 2 o'clock in the morning. It's really sad!

Log on to the monitoring system to view the traffic. At ordinary times, the bandwidth of a server with the highest traffic reaches 1 GB (because the server was in a hurry to handle the fault and did not stay). Normally, its peak bandwidth is stable at 600 M-700 M/s, as shown in:

To view other servers, the bandwidth diagram is basically a straight line, and 100 M is fully occupied (the performance of these servers is poor, and the bandwidth is M ).

Although the bandwidth of multiple servers in multiple data centers is much larger than normal, it can be determined that the problem has occurred. However, I am still worried that cacti monitoring is inaccurate or has failed. Therefore, log on to several high-traffic servers separately and use tools such as iptraf to view the results in real time. The results are consistent with those given by cacti.

This is a download service. The total bandwidth peak is about 3 Gb/s. Its structure is divided into three layers: Origin Site, transit layer, and edge layer. Shows the business process:

1. Users can edit and upload files to the Origin Site Server through web excuses;

2. The source station uses rsync to synchronize files to the Transit server;

3. Configure the Edge server as a cache, and then capture the desired object storage from the intermediate server as needed.

To improve availability and load balancing, edge servers capture files from two transit servers.

Generally, traffic spikes are caused by attacks, website marketing, system or program exceptions, and Trojans. By asking the related market personnel, I replied that there was no marketing promotion in the near future. Then I asked the programmer if there were any modifications to the program or new plug-ins, and the answer was no. Let the management personnel check the background statistics, but the statistics are not synchronized with the traffic surge. Therefore, the cause of the problem is only system exceptions and hacker attacks. The probability of a Trojan being implanted is very small: the program is uploaded through a vpn and only has static content.

In an emergency, it is impossible for every server to log on. Therefore, first check the largest traffic and then check the traffic. Check items include:

(1) System Log: Check whether any kernel error is reported;

(2) Check whether ip sources are too concentrated in Web logs;

(3) view the tcp status and check the Request status;

(4) use the iptraf tool to view ip addresses with the most connections.

Through the above measures, we know that the ip address with the most connections is not from the user, but from the mutual requests between servers. By checking the ip address, log on to the server and check whether something happened? No cause is found by checking the process, system logs, and network conditions. Run crontab-l to check whether there are any automatic tasks. The script is executed every 10 minutes. I don't remember writing such a script. The content is as follows:

#! /Bin/bash

Path = 'grep proxy_cache_path/usr/local/nginx/conf/vhosts/apk_cache.sery.com.conf | awk '{print $2}' | sed 1d'

For I in 'ls $ path'; do

Grep-a-r apk $ Path/$ I/* | strings | grep "KEY:">/tmp/cache_listpolici.txt

Grep-v apk $/tmp/cache_listpolici.txt>/tmp/delpolici.txt

\ Rm-rf 'grep-v apk $/tmp/cache_listpolici.txt | awk-F: '{print $1 }''

# Echo $ Path/$ I

Sleep 60

Done

\ Rm-rf/tmp/cache_list *

This script can be understood only when combined with specific scenarios. For some reason, it will not be analyzed here. In short, the role of this script is to check whether some files exist in the cache directory. If yes, delete it.

The result of the above operation is that the cache file has just existed and will soon be killed. When you need to download this file, the Edge server does not cache it, so you have to go back to the source (capture it from the Transit server ). Under normal circumstances, it will be cached for a long time, but because of this script, it will be killed again later. This leads to a large number of back-to-source requests, and traffic surges. Avoid Risks and delete the script, but comment out the script in the crontab scheduled task. Troubleshoot the task one by one on the Edge server and comment out the task.

Observe the traffic diagram and the bandwidth consumption will gradually decrease. After 10-20 minutes, it will become normal. After making a call, go to bed again.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Measure the test taker's knowledge about traffic spikes.

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Measure the test taker's knowledge about traffic spikes.

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support