Troubleshooting and Analysis of Common Faults of core layer Switches

Source: Internet
Author: User
Tags passthrough domain name server

With the development of China's exchange industry, it also promotes the upgrade and improvement of the core layer switch technology. Here we mainly analyze some fault analysis and troubleshooting of the core layer switch. The LAN of multiple branches directly accesses the local headquarters enterprise network through 10 m Fiber Channel of the local carrier. The networks of all local branches are converged to the core layer switch, and the core layer switch is directly connected to the router. Other core layer switches are responsible for the access of various network services, so that the network structure is relatively simple and clear, and the actual running status is relatively stable.

A few days ago, the network suddenly experienced a wide-area breakdown and failed to work properly. Based on the network topology and fault phenomenon, you can quickly locate the problem of the switch device in the core layer. After arriving at the scene, an alarm is reported on the main control board, and the device is reset. The alarm is not cleared. It can be determined that the main control board is damaged, the new control board is replaced, and the device runs normally, all layer-2 passthrough services are restored, but all IP services are not recovered.

Troubleshooting and Analysis

Troubleshooting 1. Physical or logical faults?

The root cause of the fault is that the main control board of the core layer switch has a problem, so that the nature of the network fault is physical fault. Is there a problem with the new main control board? However, the device is running normally and there is no alarm information. For example, show card and show cpu. In terms of running status, the hardware is correct. Is the data lost after the device is changed? Check the relevant data and find that the data is not lost, but can't the IP service be restored? However, there is no problem with some passthrough services. Is there a problem?

Troubleshooting 2. What is the problem with the DNS service?

After inspection, I found that although the service is not available, all routing information is normal, and PING all Network Element Information is normal. Is there a problem with the DNS service? The so-called DNS, that is, the Domain Name Server, converts the domain name to an IP address that can be recognized by the computer. For example, the IP address of the website is 219.218.100.100. If an error occurs on the DNS server, the domain name cannot be interpreted, and the Internet is naturally unavailable. Sometimes it is a vro problem and cannot be connected to the DNS Service of the ISP. In this case, you can disable the vro for a while and then enable or reset the vro. The NIC may not be able to automatically find the DNS server address. You can try to use the specified DNS server address. Go to "Control Panel → network and dial-up connections", double-click "local connection → properties → TCP/IP protocol", and select "Use the following DNS server address" in the displayed dialog box ", then fill in the corresponding DNS Server IP address. After verification, the DNS is correct.

Troubleshooting III. Is there an ARP virus or a traffic attack?

In the early stage of a fault, some branches often report that packet loss often occurs when surfing the internet, and think of the ARP Address Spoofing virus that often occurs in the LAN recently, I will introduce all relevant network technicians to check whether their local networks are infected with the ARP Address Spoofing virus, we hope to solve the problem that the IP service cannot be restored by finding and solving the machine infected with the ARP Address Spoofing virus. After the device is changed, will all route tables be lost? After the data backups from the past few days are re-imported, the fault still exists. In order to restore the business more quickly, I consulted the technical support of the equipment manufacturer, reported all the fault phenomena to the technical engineers, and checked all the alarms and system logs, no suspicious issues were found. The final result is that the device runs normally without virus attacks or abnormal traffic.

Troubleshooting

The connection is successful, indicating that the device is correct. The network element is accessible, and the DNS is normal, but the service cannot be restored. When checking ip arp information, it is found that all MAC addresses and IP addresses are in the address table. The IP address and MAC address used by the author are also in the table. But the author's computer is not on! Is there a problem with the ports of vswitches and vrouters at the core layer? Try again, close the port, and then enable it again. I found that there is no MAC address after I use the IP address. All IP services are restored.

Fault Summary

Although the fault has been solved, what puzzles the author is: why can the service be normal after the port is restarted. The fault is not on the router, but on the core layer switch. If you do not restart the port, the test is normal. Afterwards, I consulted related technical personnel. Before the fault occurred, the data traffic was very high. When the fault occurred, many data packets could not be forwarded normally, and the port would be suspended, the data traffic transmitted is very small. When it is disabled, the excess data is discarded, restarted, and normal data information is transmitted.

Summing up the troubleshooting process above, we found that when a network fault occurs, we should check the network running status of the previous section from the normal network to determine whether the network of the previous section is normal. Then, we should check the network of the next section, use the PING command to test and perform targeted troubleshooting. Even if the fault point is recovered, it does not mean that the business is recovered. The problem cannot be limited to local information, so we should consider it globally. Combined with the specific network work environment, you may find that a fault is at the moment of negligence. You should carefully observe the factors that affect the network to avoid detours. As a Network Administrator, in addition to handling routine network faults, you may also encounter things beyond the scope of your knowledge from time to time. However, if you pay enough attention to them, you will always find a solution to the problem.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.