Comprehensive Analysis of switch fault types and analysis methods

Source: Internet
Author: User

Switch faults are still quite common. So I have studied the switch fault types and analysis methods, and I will share them with you here, hoping they will be useful to you. Failure during switch operation is inevitable. However, it is the duty of maintenance personnel to promptly handle the fault, identify the fault point as soon as possible, and eliminate the fault. To do this, you must understand the type of switch faults and have the ability to analyze and handle faults. This article briefly introduces the common fault types and analysis methods of vswitches.

1. fault classification

According to the author's years of experience in maintaining a Programmable switch and the switch faults encountered during work, the switch faults are generally divided into the following types. The specific types are:

1) damaged circuit board
The components on the circuit board are damaged or the substrate is poor, causing the circuit board to fail to work normally.

2) Hardware manual injection is not suitable
A hardware note is a set of switches or switches set on a circuit board to reduce the type of the circuit board. It defines the operating status of the circuit board or the position in the system, if the hardware note is set incorrectly, the circuit board may not work properly.

3) The circuit plate type is not suitable.
After the hardware is updated, the circuit board with the same name may have multiple models. In general, the functions of the new model board are compatible with those of the old model board, but the functions of the old model board are not necessarily compatible with those of the new model board.

4) rack and module Problems
Racks and modules are used to carry circuit boards. They are divided into racks and modules of the processor system according to their positions in the system, and the racks and modules of the switching system, modules, and Maintenance Management System. These racks and modules also fail.

5) device power supply problems
The-48 V Direct Current provided by the rectifier is allocated to each rack and related equipment. The power distribution system in the rack is responsible for supplying power to the module, and the power circuit board on each module, it can be adjusted according to the voltage required by each circuit board in the module, and then delivered to each circuit board. However, any link in this process may cause power supply faults.

6) connection cable and distribution frame jumper Problems
Jumpers connecting cables and distribution frames are used to connect modules, racks, and devices. If the cables or jumpers in these connection cables are short-circuited, disconnected, or virtualized, communication System faults will be formed.

7) program BUG
Software program design is flawed.

8) System Data Error
System data, including software work note, is used to define the entire system. In case of system data errors, all system faults may occur, affecting the entire exchange board.

9) Bureau data error
Bureau data is defined based on the specific circumstances of the Exchange Board. An incorrect data may also affect the entire exchange board.

10) User Data Error
User Data defines the situation of each user. If user data is incorrectly set, user data errors may affect a user.

2 Fault Analysis and Handling Methods

Different switch faults have different forms. The purpose of fault analysis is to analyze the fault phenomenon, identify the cause of the fault, and determine the location of the fault to eliminate the fault. In order to make the fault analysis work orderly and follow the rules, we need to refer to the fault classification table level in the fault analysis to gradually promote. First, identify software faults or hardware faults by level-1 classification, and then perform recurrence based on level-2 and level-3 classification. There are many testing methods for Level 5 Classification. Some common testing methods include:

1) Exclusion
Based on the fault phenomenon, list the possibility of the fault and then gradually eliminate it. When listing the possibility of failure, we should make it as comprehensive as possible and avoid any omission. The possibility should be simplified and complicated to avoid ineffective work. This method is logical and can cope with various faults. However, it has a high requirement on maintenance personnel and requires maintenance personnel to have a comprehensive and in-depth understanding of the exchange system.

2) Comparison
Compare the difference between faulty equipment and normal equipment to identify the fault location. This method is easy to use and is especially beneficial for troubleshooting software faults, but its disadvantage is its limited use. In particular, some faults cannot find effective comparison benchmarks.

3) replacement method
Replace suspicious devices with normal devices. This method is mainly used to handle hardware device faults. Note whether the model, type, and hardware of the normal device are exactly the same as that of the device to be replaced.
The above methods are sometimes used in turn in practical use to quickly and accurately locate the fault point. The following describes the Fault Analysis and Handling Methods Based on troubleshooting.

Fault 1)

Fault description: after a new bureau is opened, some users often fail to call out when making outgoing calls. During traffic statistics and monitoring, the outgoing call loss is found to be too large, close to 30%, and no branch direction.

Fault Analysis: This fault phenomenon cannot clearly determine whether it is a software fault or a hardware fault, and there is no comparable device for reference. Therefore, the comparison method and replacement method cannot be used, and only the exclusion method can be used for troubleshooting.

Because the fault is related to the call and has nothing to do with the user and does not affect other calls of the user, the user circuit and switching system factors can be ruled out. However, according to the call process, we can analyze the hardware devices related to outgoing calls, including the user circuit, inter-Bureau relay circuit, transceiver and exchange system, the software systems related to outgoing calls include user data, inter-Bureau relay Bureau data, and transceiver Bureau data. In this way, based on the principle of simplicity and complexity, we can first test the local relay circuit. No problems were found, and then we tested the transceiver, it is found that nearly 25% of the receiving and sending coders cannot be used and are concentrated on the same module. However, no problems have been found when you cannot use the transceiver to check the data settings of the local authorities by number. At this point, we are sure that, the failure of the vswitch is caused by hardware problems in the same module ). Therefore, the hardware module and the connected device are checked, and compared with the modules that do not support sending and receiving devices, the model of the control circuit board is incorrect. After the correct type of control circuit board is replaced, the switch fault is eliminated.

Fault 2)

Fault description: a central processor module in a certain Bureau cannot run in the dual-host State. The information returned by manual input of the dual-host command is that the backup side of the central processor module cannot work normally; the diagnostic command indicates that some circuit boards on the backup side are abnormal.

Fault analysis: the fault is obviously a hardware fault. According to the fault information, the replacement method is used to replace the circuit board on the backup side. However, the fault does not disappear after the circuit board is replaced. That is to say, the real fault point is not on the circuit board on the standby side. The cause of the fault on the standby side may be the rack, module, power supply, and connection equipment. As a result, the division method was used to gradually investigate these devices, especially to find out the statement of work of the central processor module, carefully analyze the process of switching from a single machine to a dual machine, and the dual-machine commands were issued by the human and industrial development personnel, first, the master side receives data, and then the master side sends a set to the backup side through the control circuit board of the master side for self-check. If the backup side is normal, the backup side will reply to the primary side and be ready to receive the primary side information. If the active user side receives a response from the standby side, the current data will be transmitted to the standby side for dual-machine operation. However, the problem is that the master side does not receive the reply from the slave side. Why? Is it because the primary node does not send dual-host commands to the standby node or the standby node does not receive normal reply messages? These are related to the transmission of the active-standby control circuit board. To this end, after restart, the master and backup devices of the central processor module are forced to be switched, and the active and standby control circuit boards on the active and standby sides are replaced. The fault will disappear. The fault type is a circuit board damage fault.

Fault 3)

Fault description: The relay circuit facilities of a local urban construction company set up an Inter-Bureau relay. However, after the hardware facilities are installed, an error occurs when the local data is entered, that is, the local data storage status is incorrect.

Fault analysis: the fault is based on the fault type, which is obviously a software fault and cannot be replaced. To this end, according to the principle of simplicity and complexity, the comparison method is adopted first, that is, the relevant bureau data is printed and compared with other bureaus. It is very difficult to find a suspicious point in the memory management table. The local data memory management table is managed based on the starting address, ending address, storage space, and remaining space. The number of remaining space in the memory management table of the Bureau is much larger than that in the storage space, which is obviously a problem in the management of local data. So I used the machine code to modify the command, adjusted the local data memory management table, and ruled out the fault, so that the local data operation was normal.

3 conclusion

Faults are diverse, and the causes are also diverse. The same fault may have many different faults. The failure of the same vswitch may also be caused by different causes. For example, the failure of a user's circuit board can be either characterized by a user having no sound, noise, or error number, or a single pass or no ringing. In addition, there is no bee sound for the user, which may be caused by poor contact with the distribution frame, user module problems, or exchange system problems. Therefore, it is required that you understand the fault phenomenon as comprehensively and in detail as possible and use the fault analysis method flexibly when conducting fault analysis. At the same time, it is necessary to make analysis records, organize and record the entire process of Fault Analysis and Handling, so as to accumulate experience and continuously improve the level of troubleshooting.
 

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.