I. fault description
Fault Location:
An electronics Bureau
Fault symptom:
The network is severely congested, and communication between internal hosts or even between internal hosts is interrupted.
Fault details:
Network Communication is suddenly interrupted, some VLANs cannot access the Internet, and access to other VLANs is also interrupted. Ping packet testing is performed in the IDC, it is found that the ping packet from the central switch to the host in this VLAN has a long response time, and intermittent packet loss occurs.
Packet loss is more serious.
Ii. Detailed Fault Analysis
1. Preliminary Analysis
The cause of the problem may be: Switch ARP table update problems, broadcast or route loop faults, human or virus attacks.
Information to be further obtained: network topology and normal operation conditions, switch ARP table information and switch load conditions, original packets transmitted in the Network
2. Detailed Analysis
First, we learned from the network administrator about 450 hosts on the network, and obtained a simple topology of the Network, as shown in figure 1.
(Figure 1 original network topology)
As shown in figure 1, the network is divided into six VLANs, which are 10.230.201.0/24, 10.230.202.0/24, 10.230.203.0/24, 10.230.204.0/24, 10.230.205.0/24, and 10.230.206.0/24, of which 201 ~ 205 these five VLANs are used for one department, while 206 is the dedicated network segment of the server. Each VLAN is connected to the upper-center switch (passport 8010) at the same time, and the central switch is connected to the firewall, which is connected to the Internet and provincial units.
After learning about the network topology, we logged on to the center switch as a Super Terminal and found that the switch has a large load. We immediately cleared the switch ARP table and restarted, but the failure still exists, so we decided to perform packet capture analysis on the network.
Configure the port image (specific configuration information, omitted) on the central switch (passport 8010), and connect the notebook that installs the kelai Network Analysis System to the mirror port of the central switch, figure 2 shows the network topology after installation.
(Figure 2 network topology after kelai network analysis system is installed)
Because the kelai network analysis system can capture and analyze data across VLANs, the topology of the network has not changed after the laptop of the kelai network analysis system is connected to the central switch.
Open the kelai Network Analysis System on the notebook, capture the data packet for about 1 minute (the exact time is 53 seconds after the capture is stopped), stop the capture, and analyze the captured data communication.
Locate the worker network segment under the node browser and find that the MAC address is 00: 00: E8: 40: 44: 99. There are 40 IP addresses and 3 in total.
CTRL + zoom in or out the scroll wheel "resized =" true ">
(Figure 3 locate the endpoint view of the Region network segment)
We know that under normal circumstances, multiple IP addresses appear under a MAC address, only one of the following situations is possible: Gateway, proxy server, manually bind multiple IP addresses. Ask the network administrator to know that all machines in the CIDR block are bound with only one MAC address, and there is no proxy server, and the MAC address is not the gateway MAC address. Therefore, we suspect that, this host may have a spoofing attack.
Right-click the 00: 00: E8: 40: 44: 99 node in Figure 3, and select the "locate browser node (l)" command in the pop-up menu, locate 00: 00: E8: 40: 44: 99 in the node browser. View the Protocol view and find that the node actively initiates 22613 ARP reply packets, while the arp request packets only have 2, as shown in figure 4.
CTRL + zoom in or out the scroll wheel "resized =" true ">
(Figure 4 protocol distribution for host communication: 00: E8: 40: 44: 99)
As you can see from the data packet below figure 4, 00: 00: E8: 40: 44: 99 actively sends ARP reply packets to other hosts in the network, the content is to tell the other host, you are the host of an IP address, and the IP address is constantly changing. It can be concluded that ARP spoofing is performed on machines whose MAC address is 00: 00: E8: 40: 44: 99.
At the same time, the ARP diagnostic event area in the diagnostic view also provides the corresponding prompt information, 5.
CTRL + zoom in or out the scroll wheel "resized =" true ">
(Figure 5 ARP diagnostic information for 00: 00: E8: 40: 44: 99)
After the above analysis, we confirmed that there was an ARP spoofing attack on 00: 00: E8: 40: 44: 99, and the network administrator immediately began to look for the host, since they used to make a statistical table of IP and MAC addresses, they can easily find the machine. If the network cable of the host is switched off on a L2 Switch, the network can quickly restore to normal, and internal and external access (including Internet and provincial network units) between VLANs can all recover to normal.
In addition, as shown in figure 3, 00: 02: B0: BC: 68: D2, 00: 0b: DB: 4b: 46: 81, 00: 11: 25: 8d: 7d: C1 three machines occupy a large amount of traffic. After checking the specific traffic of these machines, we found 00: 02: B0: BC: 68: D2 and 00: 0b: DB: 4b: 46: 81 copies data to each other, while the IP address corresponding to 00: 11: 25: 8d: 7d: C1 is 10.230.204.1, it is the gateway of the 10.230.204.0/24 network segment. This basically determines the root cause of network disconnection, that is, the 00: 00: E8: 40: 44: 99 host found above.
After finding the fault point and restoring the network to normal, we left the site because of other issues and did not troubleshoot the specific situation of 00: 00: E8: 40: 44: 99.
In the afternoon, I received a call from the electrical authority's network administrator informing me that the user only used word to edit the document when he found the host with the MAC address: 00: 00: E8: 40: 44: 99, no human attacks were initiated, and anti-virus software was installed to scan and kill the host. Several viruses were detected. After virus detection and removal, the host was connected to the network again, and network communication remained normal. The cause of network failure is that the host with the MAC address 00: 00: E8: 40: 44: 99 is infected with the worm and the virus automatically performs ARP spoofing attacks, network Access is interrupted.
Iii. Summary
In medium and large networks, network faults are complex and difficult to troubleshoot without using professional network analysis tools. In this example, if data packets are not captured, even if you view the traffic on the vswitch, it is difficult to find the fault point because the traffic at 00: 00: E8: 40: 44: 99 is not very large.
At the same time, because the packet capture time is short and only 53 seconds, there may still be some undetected hosts in the network (these hosts are not started yet and will not send and receive corresponding packets, ). Therefore, for enterprise network operations, network administrators need to use dedicated network analysis tools for long-term effective monitoring and analysis of networks, to eliminate possible network faults and network security threats.