Resolve network packet loss problems in linux

Source: Internet
Author: User

Resolve network packet loss problems in linux
Recently, I encountered problems in my business. I found that the value of overruns is constantly increasing. I learned some related knowledge.

It is found that the number is constantly increasing. G, I found that the meanings of these errors, dropped, and overruns are not the same.
Eth2 Link encap: Ethernet HWaddr 00: 8C: FA: F1: DA: 78
Inet addr: 10.249.2.112 Bcast: 10.249.2.255 Mask: 255.255.255.0
Up broadcast running multicast mtu: 1500 Metric: 1
RX packets: 26191508237 errors: 0 dropped: 0 overruns: 45732243 frame: 0
TX packets: 20141298524 errors: 0 dropped: 0 overruns: 0 carrier: 0
Collisions: 0 FIG: 1000
RX bytes: 4684832167216 (4.2 TiB) TX bytes: 4670328443919 (4.2 TiB)
Memory: c7200000-c7280000
RX errors: indicates the total number of errors to receive packets, including too-long-frames errors, Ring Buffer overflow errors, crc verification errors, frame synchronization errors, fifo overruns, and missed pkg.
RX dropped: indicates that the data packet has already entered the Ring Buffer. However, due to insufficient memory and other system reasons, the data packet is discarded when it is copied to the memory.
RX overruns: indicates the overruns of the fifo. This is because the IO transmitted by the Ring Buffer (aka Driver Queue) is greater than the IO that can be processed by the kernel, the Ring Buffer refers to the buffer before initiating an IRQ request. Obviously, the increase of overruns means that the packets are discarded by the NIC physical layer before they reach the Ring Buffer, and the failure of the CPU even if the processing is interrupted is one of the reasons that the Ring Buffer is full, the faulty machine above is the packet loss caused by the uneven distribution of interruprs (all pressed on core0) and no affinity.
RX frame: indicates the frames of misaligned.

1. Check the hardware status first.
A machine often receives an alarm of packet loss. First, check whether there is any problem at the bottom layer:
# Ethtool eth2 | egrep 'speed | Duplex'
Speed: 1000 Mb/s
Duplex: Full

# Ethtool-S eth2 | grep crc
Rx_crc_errors: 0
Speed, Duplex, CRC, and so on are all correct. The physical layer interference can be basically ruled out.

2. You can see that the overruns field is constantly increasing through ifconfig:
For I in 'seq 1 100 '; do ifconfig eth2 | grep RX | grep overruns; sleep 1; done
It has been added here
RX packets: 26191785302 errors: 0 dropped: 0 overruns: 45732243 frame: 0

3. view the buffer size.
I have found some foreign articles and can use ethtool to modify the buffer size of the NIC. First, we must support the NIC. My server is an INTEL M Nic. Let's take a look at ethtool description.
-G-show-ringQueries the specified ethernet device for rx/tx ring parameter information.
-G-set-ringChanges the rx/tx ring parameters of the specified ethernet device.

View the buffer size of the current Nic. ethtool-g eth0
Ring parameters for eth0:
Pre-set maximums:
RX: 4096RX Mini: 0
RX Jumbo: 0
TX: 1, 4096
Current hardware settings:
RX: 256
RX Mini: 0
RX Jumbo: 0
TX: 1, 256

4. Modify the buffer size.
Ethtool-G eth2 rx2048
Ethtool-G eth2 tx 2048

View packet loss
[Root @ appserver1 network-scripts] # cat/proc/net/dev | column-t
Inter-| Receive | Transmit
Face | bytes packets errs drop fifo frame compressed multicast | bytes packets errs drop fifo colls carrier compressed
Lo: 1697064305645 4937104295 0 0 0 0 0 1697064305645 4937104295 0 0 0 0 0 0
Eth0: 72829268758 343814516 0 21338 0 0 0 9764241 74743576507 0 0 0 0 0 0
Eth1: 5826509023 48719872 0 0 0 0 0 11358883 127451707 1107964 0 0 0 0 0
Eth2: 4684766978372 26191366713 0 0 45732243 0 278436828 4670300836866 20141168183 0 0 0 0 0 0
Eth3: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Bond0: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[Root @ appserver1 network-scripts] # netstat-I | column-t
Kernel Interface table
Iface MTU Met RX-OK RX-ERR RX-DRP RX-OVR TX-OK TX-ERR Flg
Eth2 1500 0 26191244868 0 45732243 20141056331 0 0 0 BMRU
Lo 16436 0 4937053994 0 0 4937053994 0 0 0 LRU


Problem: an error occurs when a queue overflow is received. When more packets arrive than the packages that can be processed by the kernel, the computer will overruns ). When the input queue reaches its upper limit (max_backlog), all the more arrived packets will be discarded.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.