An improper port connection between switches in the Ethernet may cause a network loop. If the related switch does not enable the STP function, this loop will lead to endless repeated packet forwarding, forming a broadcast storm, this causes network faults.
One day, we found a problem with a VLAN in the campus network performance monitoring platform-the connection between the access switch and the campus network was interrupted. Check the aggregation switch that is placed in the network center. A large amount of inbound traffic is measured for the 100BASE-FX port connected to it, but the outbound traffic is very small, which is abnormal.
However, the performance of this aggregation switch seems to be okay and there is no problem. Therefore, we mirror the abnormal port on this aggregation switch and use the protocol analysis tool Sniffer to capture packets. At most, we can capture more than 0.1 million packets per second. A simple analysis of these data packets shows some of the common features.
At that time, we were eager to repair the network as soon as possible and did not go into the characteristics of these data packets. We only saw 1st points and thought the network was under an unknown Syn Flood attack. It was estimated that it was caused by a new network virus, immediately disable the port on this aggregation switch to avoid network performance degradation.
Troubleshooting
In order to test network connectivity in the field, in the network center, we connected the multi-mode pigtails connected to the building to the gossip switch to a PC using a twisted pair through the photoelectric converter, and simulate it as the faulty VLAN gateway. Then, I went to the building network manager and asked him to help us find and isolate hosts infected with unknown viruses as soon as possible.
According to the network manager of the building, the network was still normal yesterday. However, a department in the building was making network adjustments at that time. Today, when I went to work, I found that the Network was not working and I don't know if it had any relationship with them. We believe that the adjustment of the network should have little to do with virus infection. In the main floor wiring room, we unplug the network cables on the access switch and connect them to the laptop to connect the test host in the network center.
After we confirm that the link is correct, we will insert half of the remaining network cable quantity back to the switch. If the test shows no problem, we will continue. Otherwise, we will change the other half, gradually reduce the number of suspected problematic Network cables. We finally found a network cable that could cause problems. If we plugged in the network cable, the network in the building would be disconnected from the simulated gateway.
Identified by the Building Network Manager, this network cable is connected to the department that made network adjustment yesterday. He also said that the Department had previously pulled one master, one slave network cable, and there should be another one, and found the other one on the switch. Plug in one of the two network cables at will. The network is okay, but if you plug in the two networks at the same time, there will be a problem.
Will the line activate the SYN Flood Attack of the network virus? At this time, we think this phenomenon is more like a loop in the network. When we arrived at that department, we found that three non-managed switches were all bundled together. However, two of them were connected to the access switch through the two network cables, which resulted in a network loop.
Apparently, the construction staff was not clear about the network topology. When the building's network manager went out, he thought he had to connect the lines wrong, which caused the network accident. You can easily find the cause. You only need to unplug one of the above network cables to restore network connectivity. After some twists and turns, the network has recovered to normal, but we have been wondering, what interferes with our judgment?
Fault Analysis
A typical network loop fault uses protocol analysis tool Sniffer to capture so many data packets. After some analysis, I did not see the problem. Apparently, the first sight of a large number of SYN packets gave us the illusion that it was a SYN Flood attack.
Afterwards, we reviewed the network loop troubleshooting process, re-analyzed the captured data packets carefully, and explained the five common characteristics of the data packets mentioned above, this allows you to respond to similar problems in a timely manner.
First look at the first four features: The aggregation switch is a network-layer device, and the network-layer interface of the VLAN to which the building belongs is set on this aggregation switch. In order to implement the network management policy, you have bound MAC addresses to registered or unregistered IP addresses.
TCP connections can only be established after three handshakes. The length of the SYN Packet initiating the connection here is 28 bytes, plus 14 bytes of Ethernet frame header and 20 bytes of IP header, the frame length captured by Sniffer is a total of 62 bytes and does not contain 4 bytes of error detection ).
It happens that the unicast frame accessing the VLAN was a TCP request packet from the Internet. According to the Ethernet bridge forwarding mechanism, after the CRC correctness check, the static ARP configuration has been completed, this aggregation switch will convert the source MAC address of the unicast frame to the MAC address of the machine. The destination MAC address will be changed based on the binding parameter, and the CRC value will be recalculated to update the FCS domain, after this re-encapsulation, it is then forwarded to the access switch of the building.
Look at the last feature: A Bridge is a storage and forwarding device used to connect to a similar LAN. These bridges listen to each transmitted data frame on all ports and use the bridge table as the basis for forwarding the data frame. The bridge table is a list of MAC addresses and port numbers used to reach the MAC address.
In this loop forwarding process, bridge A continuously receives the same frame on different ports, because the receiving port is changing, the bridge table is also changing the list of "source MAC-port number.
Previously, assume that the bridge table does not have the target MAC address of the frame. After receiving the two unicast frames, the frame can only be broadcast to other ports except the receiving port again, so the frame will also be forwarded to the uplink port.
For each unicast frame, bridge A repeats the process mentioned above. Theoretically, 21 frames will be received once broadcast and 22 frames will be received twice broadcast ,..., A 2n frame is received after the nth frame is broadcast. In short, as A result, bridge A will soon form A broadcast storm, and the copy of this unicast frame will eventually consume 100BASE-X port bandwidth.
During this period, many data frames may collide with each other and become incomplete, so that Sniffer cannot capture them. However, we can imagine that this unicast frame will repeat many times. We will check the captured packets again.
Almost all of them present duplicate marks that were not noticed at the time. Based on the 64-byte length, the 100BASE-FX port forwarding speed of the Ethernet switch can reach 144000pps. In this network loop state, Sniffer may capture more than 0.1 million packets with a length of 66 bytes per second.
- Correct Interpretation and test of vswitch Port
- Summarize the market status of high-end Switches
- Focuses on core layer switch faults
- PythonAndroid
- Let's talk about the stacking of Ethernet switches.