Diagnose network problems in Linux Kernel
A few weeks ago, we began to pay attention to the great changes in the network traffic of servers that track API in Washington. In a fairly stable daily model, we start to see 300-400 Mbps peak traffic, but our legitimate traffic (events and Manual updates) remains unchanged.
Suddenly, our network traffic began to soar as crazy.
Finding a false traffic source is imperative because these spikes are triggering DDOS Mitigation on our upstream routers to stop traffic.
There are some good built-in Linux tools to help diagnose network problems.
- Ifconfig will show your network interface and how many packets pass through them
- Ethtool-S will display more detailed information about your data packet stream, such as the number of discarded packets at the NIC level.
- Iptables-L-v-n will display the number of packets processed by your various firewall rules.
- Netstat-s will tell a lot of counter values maintained by the kernel network protocol stack, such as the number of ACK and the number of resends.
- Sysctl-a | grep net. ip will display network-related settings in all your kernel.
- Tcpdump displays the inbound and outbound contents. Getting started with Linux: How to Use tcpdump to capture tcp syn, ACK, and FIN packets
Try the ip command in Linux. ifconfig is out of date.
Practical application of CentOS ifconfig, route, and ip commands
Ifconfig cannot display the NIC after CentOS is installed
Practical application of CentOS ifconfig, route, and ip commands
Ifconfig: command not found
Ifconfig output network port and ip address in CentOS 6.2
The clue to solving the problem is the output using the netstat-s command. Unfortunately, when you check the output of this command, it is hard to tell what the numbers mean, what they should be, and how they change. To check how they change, we have created a small program to display the output of the continuous running command, which allows us to understand the speed of various counter changes. An output line looks especially worrying.
The normal rate of this counter is usually 30-40/Second on the unaffected server, so we know what is wrong. The counter indicates that we are rejecting a large number of packets because these include invalid TCP Timestamps. The temporary quick solution is to use the following command to disable the TCP timestamp:
Sysctl-w net. ipv4.tcp _ timestamps = 0
This immediately caused the package storm to stop. However, this is not a permanent solution because the TCP timestamp is used to measure the round-trip time and assign the delayed packets in the packet stream to the correct position. This will become a problem during high-speed connection, and the TCP serial number may be wound up within several seconds. For more information about TCP timestamp and performance, see RFC 1323.
In Mixpanel, when we see the abnormal traffic mode, we usually run tcpdump so that we can analyze the traffic and try to identify the root cause. We found that a large number of tcp ack packets are sent back and forth between our API server and a specific IP address. As a result, our server falls into an infinite loop of sending tcp ack packets back and forth to another server. A host continuously issues a TCP timestamp, but the other host cannot identify it as a valid timestamp.
At this time, we realized that we are processing a TCP protocol stack that can only be used in the Linux kernel to solve the problem. So our CTO turned to linux-netdev to see if a solution could be found. Fortunately, we have encountered this problem and there is a solution. Originally, this type of packet storm could be triggered by some hardware faults or third-party changes to tcp seq, ACK, or the host in the connection considers the other party to send expired data packets. To avoid this situation from becoming a package storm, the speed is limited, and the speed of sending duplicate ACK packets in Linux is set to one or two per second. Here is a very good explanation.
We will accept this patch and port it to the Ubuntu (Trusty) kernel currently in use. Thanks to Ubuntu for making it easy. To re-compile the repaired kernel, you only need to run the following command to install the generated. deb package and restart the system.
# Get the kernel source and build dependencies
Apt-get build-dep linux-image-3.13.0-45-generic
Apt-get source linux-image-3.13.0-45-generic
# Apply the patch file.
Cd linux-lts-trusty-3.13.0/
Patch-p1 <Mitigate-TCP-ACK-Loops.patch
# Build the kernel
Fakeroot./debian/rules clean
Fakeroot./debian/rules binary-headers binary-generic
Diagnosing networking issues in the Linux Kernel
This article permanently updates the link address: