This article gives you a detailed description of the network disconnection problem caused by ARP detection on the vro. I believe you have read this article to learn about route settings.
LAN networks are easy to use, but it is not easy to manage. Different Internet access needs alone make the network administrator busy, not to mention frequent network faults. This is not the case. Intermittent failures on the Internet are very common. The factors that cause the failure are also complex and changeable. It is naturally easy to take a detour to solve the fault. To help you accumulate experience in this area, from the actual perspective, this article will introduce you to the troubleshooting process of intermittent online faults caused by the dynamic ARP detection function. I hope the following content can serve as a reference!
Networking Environment for ARP Detection
A lan in a building is of moderate scale. The core switch in the central data center uses the S8500 switch of the H3C brand, all client systems are connected to access switches distributed on six floors through over 5 types of network cables. access switches use two groups of H3C S3050 switches, all these switches are located in the weak current room on each floor, and all access switches are connected to LAN core switches using Gigabit multimode optical fiber. To prevent network storms in the LAN, the network administrator specially divides the lan network into 12 virtual working subnets according to different work departments, the Gateways of each virtual work subnet are all set up on the LAN core switch. In addition, to improve network management efficiency, a DHCP server is specially set up in the LAN, each client system in the LAN uses the dynamic address acquisition method to access the Internet. All systems in the LAN can access the Internet quickly and stably.
Considering that the ARP virus has been rampant recently, to ensure that the network can always run, the network administrator has enabled the ARP virus protection function in each access switch. In order to meet the requirements of building a Video Transmission System in the unit building, video equipment located in the office room on each floor is divided into the same virtual work subnet, and the configuration of each access switch is adjusted, for example, two virtual working subnets are added.
ARP Fault Detection
After the access switch has been changed, the LAN network operation has been unstable. Many users have called to reflect the situation, saying that the prompt message about restricted network connection is often displayed in the tray area of their client system, this prompt indicates that the client system is generally unable to obtain the correct Internet access parameters from the lan dhcp server. Even if some client systems barely access the Internet, the network connection is also intermittent. When the ping command is used to test the line connectivity, it is found that the network transmission delay is very serious, and the data packet loss rate has been high; because all client systems on each floor have the same fault, I subconsciously think that the core switch of the LAN has a soft Error similar to cache overflow, so I tried to restart the core switch background system and found that the fault still exists. Later, I restarted a normal floor access switch by the way, and found that the client system under the corresponding switch was a little normal when the switch was just stable, but it didn't take long, the same fault occurs again.
ARP detection troubleshooting
Since restarting the floor access switch can temporarily restore the Internet access speed, the problem seems to be related to the floor access switch. In order to find out the truth, I immediately log on to the access switch background system on one of the floors as a system administrator, and run the "dis dia" command to scan and check the switch ports, check whether their data traffic status is normal. As a result, it is found that broadcast packets exist in the LAN, and the broadcast packet capacity is constantly increasing, is there a network virus or network loop in a lan? To eliminate this interference, the author immediately enters the switch port view mode with abnormal traffic and runs the string command "shutdown" in this status ", the switch ports with abnormal data traffic are all closed, but this effort has not brought any effect. Obviously, the intermittent failure of the Internet has nothing to do with the network virus or network loop.
Later, I randomly found a client system, and clicked the "Start"/"run" command. In the displayed dialog box, run the ping command to test the gateway address of the virtual working subnet where the corresponding client system is located. The packet loss rate reaches an astonishing 85%, and the average data transmission latency reaches Ms. However, when I try to test a site in the Internet by using the ping command on the core switch of the LAN, I find that this test is normal and the packet loss rate is only about 1%, obviously, the connection between the LAN and the Internet is normal, and the problem may occur between the core switch and the faulty client system.
In order to find the specific cause of the fault, I tested the management IP address of one of the access switches in the core switch backend System of the LAN by using the ping command, is there a physical connection problem between the core switch and the floor access switch? In order to eliminate physical cable factors, I specially found a professional optical power meter to test the connectivity between the core switch and the Multi-Mode Optical Fiber line connected to the floor access switch, it seems that the problem lies in the floor access switch.
As a last resort, I had to use the Console control cable to directly connect to the floor access switch and run the "display interface" command to view the cascade port status of the switch and the core switch, it is found that the data traffic of the cascade port is still very large, and a large number of broadcast data packets still exist. To prevent the broadcast data packets from affecting the stable operation of the LAN, the author intends to access the switch background system, the broadcast storm Suppression Function is enabled, but the function has not changed. After that, I conveniently run the "display cpu" string command to check the system resource consumption of the faulty switch. The result surprised me very much, the system CPU consumption rate of the vswitch has reached an astonishing 100%. Under normal circumstances, the system CPU consumption rate of the vswitch should be about 25%, it is no wonder that I cannot ping the faulty floor access switch from the core switch of the LAN. After the physical connection between the faulty floor access switch and the LAN core switch is disconnected, the author executes the "display cpu" string command again, the result shows that the CPU resource consumption rate of the vswitch quickly drops to about 30%. However, after the reconnection, the CPU resource consumption rate of the faulty floor access switch quickly returns to 100%. Why?
After careful analysis and comparison, the author believes that, after enabling the anti-ARP function in the access switch, the LAN has experienced unstable Internet access failures, will this feature be a "mess" in the dark? In order to verify whether my conjecture is correct, I immediately disable the dynamic ARP detection function of the access switch, and then in the corresponding switch background system, the "display cpu" command was used to check the system CPU resource consumption. The CPU usage immediately dropped from the original 100% to about 30%, and the access speed of the client system under the corresponding switch also recovered. At the same time, the CPU usage of several other access switches that have not yet disabled the dynamic ARP detection function remains high, and the access speed of the client systems under these switches is still intermittent, data packet loss is still very serious. Obviously, the intermittent failure of Internet access in the LAN is related to the dynamic ARP detection function.
Reason for ARP detection decryption
I searched the internet for the working principle of the dynamic ARP detection function. I found that this function automatically intercepts ARP data requests sent from untrusted network ports, at the same time, it will automatically verify whether the data binding behavior of the corresponding data packet is legal and check whether its address binding relationship is consistent with that in the DHCP binding table. If it is consistent, it will release the ARP data packet, if they are inconsistent, ARP packets will be discarded. This function can effectively prevent man-in-the-middle attacks and prevent local network users from modifying the physical address and IP address of the network card on their own to avoid address conflicts in the LAN. After further understanding, I found that this function is often used in combination with the DHCP sniffing function, and this function also has a significant defect, that is, the dynamic detection of ARP packets, the CPU resources of the switch system need to be constantly consumed. If the traffic of ARP packets processed is extremely large, the CPU resource consumption rate of the switch system will be very high, in severe cases, 100% of CPU resources are consumed.
While the DHCP sniffing function is working, the DHCP server automatically records the allocated Dynamic IP addresses and the correspondence between the physical IP addresses of the network adapters of the client system, when any client system is connected to the network, this function automatically checks the correspondence between the IP address of the data packet and the physical address of the network adapter to see if the correspondence is consistent with the records in the Address binding table, if they are consistent, the target data packet is allowed. Otherwise, the data packet is not allowed. This function can effectively prevent other functions of illegal DHCP servers on the LAN.
When a vswitch system enables both the dynamic ARP detection function and DHCP sniffing function, it can effectively prevent interference from illegal DHCP servers, it also prevents Internet users from arbitrarily modifying the Internet address of the client system and the physical address of the network card to secretly access the Internet, so that security and stability can be achieved; however, I am very puzzled that the floor switches here also enable these two functions at the same time. Why did they not play their due role? Instead, they only disabled the dynamic ARP detection function, in order to solve the problem of intermittent failures on the Internet? After communication and communication with integrators, The author finally found the answer to the question. Originally, when the switch system enabled the above two features at the same time, if each vswitch is divided into the same virtual working subnet, broadcast packets will be continuously sent or forwarded between the access switches, in this way, the CPU resources of the switch system will be greatly consumed, and the network will eventually become intermittently faulty.
ARP detection troubleshooting
After finding the cause of the failure, I immediately adjusted the access switch configuration parameters on each floor, removed the VLAN connecting to the video transmission system, and added a new switch, all client systems using the video transmission system use new switches to access the Internet. This ensures the stability of the original system and facilitates the management of the new video transmission system.
To sum up the troubleshooting process, the author finds that the occurrence of the fault is a coincidence. If the same VLAN is not added to the access switch on the floor, or if the dynamic ARP detection and DHCP sniffing functions are not enabled at the same time for access switches on these floors, the network will not fail. In the past, when we solved the problem of network disconnection, we often used to first observe whether the signal lights of the switch device are normal. If not, we tried to restart the switch background system, we believe that most network faults can be automatically solved. I didn't think it was so much trouble to solve this fault!