Common switch faults and troubleshooting steps

The superior performance and price of vswitches are greatly reduced, promoting the rapid popularization of vswitches.

Network administrators often encounter a variety of switch faults at work. How can they quickly and accurately find and eliminate faults? This article briefly introduces common fault types and troubleshooting steps. As vswitches are widely used in the company's network, from the low end to the middle end, from the middle end to the high end, almost every level of products are involved, so the probability of a switch failure is better than that of a router, hardware firewalls are much higher, which is why we first discuss the classification of switch faults and troubleshooting steps.

I. Switch fault classification:

Switch faults can be divided into two categories: hardware faults and software faults. A hardware fault mainly refers to the failure of the power supply, backplane, module, port, and other components of the switch. It can be divided into the following categories.

1) power supply faults:

The power supply is damaged or the fan is stopped due to unstable external power supply, aging of power supply lines, or lightning strikes. Other components in the machine are often damaged due to power supply.

If the POWER indicator on the panel is green, it indicates that it is normal. If the indicator is off, it indicates that the switch has no normal POWER supply. Such problems can be easily discovered, solved, and prevented.

To address this type of fault, we should first do a good job of external power supply. Generally, by introducing an independent power line to provide an independent power supply, and adding a voltage regulator to avoid instantaneous high voltage or low voltage. If conditions permit, you can add UPS uninterruptible power supply) to ensure the normal power supply of the switch. Some UPS provide the voltage regulator function, while some do not. Pay attention to the selection. Set up professional lightning protection measures in the IDC room to avoid lightning damage to the switch. Now there are many professional companies engaged in anti-ray engineering, which can be considered when implementing network cabling.

2) Port faults:

This is the most common hardware fault, whether it is fiber port or twisted pair RJ-45 port, in the plug-in connector must be careful. If the optical fiber plug is accidentally soiled, the optical fiber port may be contaminated and cannot communicate normally. We often see that many people like the live plugging connector, which is theoretically acceptable, but this also inadvertently increases the port failure rate. It may also cause physical damage to the port. If the size of the purchased crystal head is too large, it is easy to damage the port when the switch is inserted. In addition, if a section of twisted pair wires connected to a port is exposed to the outside, in case the cable is struck by lightning, the connected switch port may be damaged or cause more unexpected damage.

Generally, one or more ports are damaged. Therefore, after the failure of the computer connected to the port is ruled out, you can change the connected port to determine whether it is damaged. In case of such a fault, you can use alcohol cotton balls to clean the port after the power is off. If the port is damaged, you can only change the port.

3) module faults:

Vswitches are composed of many modules, such as stack modules, management modules (also called control modules), and extension modules. The failure rate of these modules is very small, but once a problem occurs, it will suffer huge economic losses. Such failures may occur if you are not careful when plugging or removing modules, or when the switch is moved, or when the power supply is unstable.

Of course, the three modules mentioned above have external interfaces, which are easy to identify. Some modules can also identify faults through the indicators on the module. For example, a stacked module has a flat trapezoid port, or some switches have interfaces similar to USB. The management module has a CONSOLE port for establishing a connection with the network management computer to facilitate management. If the expansion module is connected to an optical fiber, there will be a pair of Optical Fiber interfaces.

In troubleshooting such a fault, first ensure that the power supply of the switch and module is normal, then check whether the modules are inserted in the correct position, and finally check whether the cables of the connection module are normal. When connecting to the management module, you also need to consider whether it uses the specified connection rate, whether there is parity, whether there is data flow control and other factors. When connecting the expansion module, you need to check whether the communication mode is matched, for example, whether the full or half duplex mode is used. Of course, if the module is faulty, there is only one solution, that is, you should immediately contact the supplier for replacement.

4) backplane faults:

Each module of the vswitch is connected to the backboard. If the environment is wet, the circuit board is short-circuited by the tide, or the components are damaged due to high temperature, lightning, and other factors, the circuit board will not work properly. For example, if the heat dissipation performance is poor or the ambient temperature is too high, the temperature in the machine increases and the components are burned out.

When the external power supply is normal, if the internal modules of the switch cannot work normally, the backboard may be broken. In this case, even the electrical maintenance engineer may not be able to handle this problem, the only way is to change the backplane.

5) cable faults:

In theory, such faults do not belong to the switch itself. However, in actual use, cable faults often make the switch system or port abnormal, therefore, such faults are also classified as switch hardware faults. For example, if the connection is not tight, the cables are arranged incorrectly or in an irregular order during cable preparation. When connecting the cables, the cables should be connected using a straight line. The two optical fiber cables in the optical fiber cables are staggered, network loops are caused by incorrect line connections.

From the above several hardware faults, poor data center environments can easily lead to various hardware faults, so when we build data centers, we must first build the anti-ray grounding and power supply, indoor temperature, indoor humidity, anti-electromagnetic interference, anti-static and other environments to provide a good environment for the normal operation of network equipment.

2. software faults of vswitches:

A software fault of a vswitch refers to a fault in the system and its configuration. It can be divided into the following categories.

1) system error:

A vswitch system is a combination of hardware and software. There is a refresh read-only memory in the switch, which stores the software system required by the switch. Such errors are the same as common Windows and Linux errors. Due to the design at the time, there are some vulnerabilities. when conditions are appropriate, this may cause full load, packet loss, and wrong packets on the switch. Therefore, the switch system provides methods such as Web and TFTP to download and update the system. Of course, errors may also occur during system upgrade.

For such problems, we need to develop the habit of browsing device manufacturers' websites frequently. If there is a new system or new patch, please update it in time.

2) misconfiguration:

Beginners are not familiar with vswitches, or because the configurations of various vswitches are different, administrators often encounter configuration Errors When configuring vswitches. For example, the network is disconnected due to incorrect VLAN division, the ports are mistakenly disabled, and the switch and nic pattern configuration do not match. It is sometimes difficult to find such faults and requires some experience. If you cannot ensure that your configuration is correct, first restore the default factory configuration and then configure it step by step. It is best to read the manual before configuration. This is also one of the habits of network management. Each vswitch has a detailed installation manual and user manual, which are explained in detail in each module. Because many vswitch manuals are written in English, users with poor English skills can consult the supplier's engineers for specific configuration.

3) lost password:

This may have happened to every administrator. Once you forget the password, you can use certain steps to restore or reset the system password. Some of them are relatively simple. Just press a button on the switch. However, some operations are required.

This type of situation occurs only when data is lost due to human forgetting or switch failure.

4) external factors:

Due to viruses or hacker attacks, a host may send a large number of packets that do not comply with the encapsulation rules to the connected port. As a result, the vswitch processor is too busy to forward packets, the buffer overflow causes packet loss. Another scenario is the broadcast storm, which not only occupies a large amount of network bandwidth, but also occupies a large amount of CPU processing time. If the network is occupied by a large number of broadcast data packets for a long time, normal point-to-point communication will fail, and the network speed will be slow or paralyzed.

A failure of a network card or a port may lead to a broadcast storm. Because vswitches can only split conflicting domains, but cannot split broadcast domains without VLAN division), when the number of broadcast packets accounts for 30% of the total communication volume, the network transmission efficiency will be significantly reduced.

In general, software faults should be harder to find than hardware faults. When solving the problems, it may not take too much money, but much time. It is best to develop the habit of logging in your daily work. When a fault occurs, record the fault phenomenon, analyze the fault process, solve the fault, and summarize the fault categories in time to accumulate your own experience. For example, sometimes the network is not affected or the problem is not found due to various reasons during configuration, but the problem may gradually become apparent in a few days. If there is a log record, you can think of whether the configuration has been incorrect a few days ago. Because this is often ignored, I thought it was a problem in other aspects. After a lot of detours, I found the problem. Therefore, it is necessary to record logs and maintain information.

3. General troubleshooting steps for switch faults:

Vswitch faults are diverse, and different faults have different forms. In case of failure analysis, you must use various phenomena to flexibly use troubleshooting methods, such as troubleshooting, comparison, and replacement methods.

1) exclusion:

When we face the fault and analyze the problem, we have learned to use the exclusion method to determine the direction of the fault. Based on the observed fault phenomenon, this method lists all possible faults as much as possible, and then analyzes and resolves them one by one. We should follow the principle of simplicity to complexity to improve efficiency. This method can be used to cope with various faults. However, maintenance personnel must have a strong logical thinking and have a thorough understanding of switch knowledge.

2) comparison method:

The comparison method is to use the existing vswitches of the same model that can run normally as the reference object and compare them with the faulty vswitch to find out the fault point. This method is simple and effective, especially for System Configuration faults. You only need to make a simple comparison to find out the configuration differences, however, it is not easy to find a vswitch with the same model and configuration.

3) replacement method:

This is our most commonly used method, and it is also a frequently used method in the maintenance of computers. Replacement refers to using normal switch components to replace faulty parts, so as to find the fault point. It is mainly used for the diagnosis of hardware faults, but it should be noted that the replaced parts must be the same type of switches of the same brand and model.

Of course, we can analyze faults according to the following principles in order to make the troubleshooting work follow the rules below.

1. From far to near

Because the switches usually applaud such as port faults) are found through the connected computer, it is often checked from the client. We can check the remote fault one by one based on the client computer> port module> horizontal cable> jumper> switch.

2. From the outside

If the switch has a fault, we can first identify the various external indicators, and then check whether the internal components are faulty according to the fault instructions. For example, if the power is green, the power supply is normal. If the power is off, the power supply is unavailable. If the LINKLEDs is yellow, the connection currently works at 10 Mb/s. If the power is off, the connection is unavailable, flashing indicates that the port is manually disabled by the Administrator; rdp led indicates redundant power supply; mgmt led indicates the Administrator module. Regardless of whether the fault is located from the outside, you must log on to the switch to determine the specific fault and take corresponding troubleshooting measures.

3, from soft to hard

In the event of a fault, no one wants to move, so the screwdrivers should first split the switch, so during the inspection, the system configuration or system software should always be used for troubleshooting. If the problem cannot be solved on the software, the hardware is faulty. For example, if a port is not easy to use, we can first check whether the user's connected port is not in the corresponding VLAN, whether the port is disabled by another administrator, or other reasons for configuration. If the system and configuration possibilities are eliminated, You can suspect that the real problem lies in hardware faults.

4. Easy and difficult

In case of complex fault analysis, you must start with simple operations or configuration. This can speed up troubleshooting and improve efficiency.

Iv. Summary:

Due to various switch failures, there are no fixed troubleshooting steps, and some faults are often clearly oriented and can be identified at a glance. Therefore, you can only analyze the problem based on the actual situation. Of course, no matter what kind of fault is difficult for a new network administrator. If you want to become a master of switch troubleshooting, we must accumulate experience in our daily work. Every time we get a problem, we carefully review the root cause of the problem and the solution. In this way, we can constantly improve ourselves and better fulfill the important responsibilities of network management.

