A classic case of Network Fault Elimination using a network tester

Source: Internet
Author: User
Tags cisco switch

Mr. Zhang, the network administrator of an agency's information center, called our company to help troubleshoot the internal office network faults. This fault has affected their network operation for some time. After we arrived at the site, Mr. Zhang introduced the network situation of their organization, see network topology) and network fault performance. According to Mr. Zhang, basically all network members Access Server 2 at a very slow speed, and Ping test connectivity is good, within 2 ms, it takes about five minutes to copy a 30Mbytes file from the server. For this reason, they have made many adjustments, even considering upgrading servers and networks.


Network Structure Diagram


Generally, it is difficult to directly identify the cause of this seemingly simple fault. Sometimes it may be a problem of the network itself or a problem of network applications, it may also be related to the configuration of the network or service device. Therefore, where to start testing becomes a key issue. To further understand the overall situation of the user's network, we decided to first install a set of Network Monitoring and Management software from fluke Corporation to obtain more network information. This software adopts a distributed structure. We install a monitoring station in each segment of the user, and use our laptop as the monitoring console to communicate with each monitoring station, for more information, see ).

From the above figure, we can summarize the device servers, switches, routers, and RMON devices in each CIDR block )? Are there any serious problems with these devices )? Is there any abnormal column chart showing the traffic of each site )? After confirming that there are no major problems, we can view the network segment details and find the server complained by the user. For details, see.


After no valuable information is obtained, we use the "Exchange route tracing" function to test the transmission link between a client and the server. For details, see. We can see that this client and problem Server 2 are connected to ports 3 and 5 of the same Cisco switch. Port 3 of the vswitch connected to the client enables the history function to conveniently obtain statistics on traffic, broadcast, conflicts, and errors. The figure shows that the port does not have any network layer problems. Port 5 of the connection server does not have the history feature, so server statistics cannot be obtained.


Exchange route Tracing

Is the root cause of the problem on the server side? We immediately connected the Optiview portable network protocol analyzer to the network segment of the server at location 3. You can quickly find this Cisco switch through the network search function and view the traffic of port 5th of the switch connecting to the server. For details, see. It is found that the traffic on this port is not large and should not cause a fault of slow client access. In turn, check for port errors and find no wrong data packets, but there is a conflict.


, 5th port traffic



This exception caused us to note that the port only connects to the only device on the server. How can this cause a conflict? We switch to the vswitch list and further check the port information.) We found that the connection speed is 10 Mb/s and half duplex, And the vswitch supports full duplex connection, is it possible that the server and switch port duplex do not match? After obtaining the user's consent, we disconnected the server from the switch and connected it to a network multimeter for testing. The results confirmed our speculation ). The NIC of the server works in full duplex mode, while the adaptive function of the switch port fails, so that it only works in half duplex mode. In this way, the duplex mode of both ends does not match, resulting in a conflict error. After finding the cause of the fault, it takes more than 10 seconds to copy a 30Mbytes file from the server after the user temporarily sets the server network to the half-duplex mode, in addition, no conflict was detected using Optiview monitoring. After the problem is confirmed, the solution is so simple that you only need to manually set the port working mode of the displayed switch to full-duplex mode.


, Two-way arrows show the full/Half Duplex working status

Background:
So why does the mismatched duplex mode conflict and affect the data transmission of the entire network? Because the full-duplex devices do not follow the multi-channel access process of the carrier with conflict detection. If a full-duplex device has a data frame to be sent, it directly sends the data, regardless of whether the current data is being received. At this time, if a half-duplex device connected to it happens to be sending data, there will be a conflict. A half-duplex device that complies with CSMA/CD will immediately send a blocking signal. After the backoff delay, it will resend the data and cause a decline in network performance.

If SNMP is enabled inside the switch or the RMON function is enabled, the number of conflicts is counted and recorded in the MIB library. When we access the Optiview network integrated protocol analyzer, we can use it to read the information of the MIB database in the switch to find the historical records of the conflict, this statistical information helps us find network faults that have plagued users for a long time.

Http://anheng.com.cn/news/html/network_troubleshooting/209.html

Experience and summary
When troubleshooting complex network faults, we often need to test and analyze fault phenomena from multiple perspectives to determine the fault point. We will use a self-built Internet management system) and conduct on-site tests) the combination of theory and experience should also be used.

When analyzing and solving access performance problems in interconnected networks, we usually have several analysis models and methods:


1. layer-7 network structure analysis model method: the definition and function of the layer-7 network structure are analyzed and checked one by one. At this time, the traditional and most basic analysis and testing methods are used.
There are two ideas: bottom-up and top-down. Bottom-up: from the physical layer to the application. Top-down: capture data packets from application protocols, analyze data packet statistics and traffic statistics to obtain valuable data.

2. Network Connection Structure Analysis Method: from the network connection structure, we can roughly divide it into three modules: client, network link, and server end.

A) faults in analysis and detection may come from various client situations. The client also has a layer-7 network structure, and such faults may also occur, from hardware to software, from drivers to applications, from setup errors to viruses, and so on. Therefore, a large amount of background knowledge is required in the process of analyzing and testing the client, and sometimes the PC fever experience will be helpful. In the actual test process, users on the client can be asked on the site. The problem is personality or common. This problem will be very helpful for determining the further detection decision on the client.
B) network link problems usually require network management, field testers, and even protocol analyzers to help identify the nature and cause of the problem. In this case, you must have a solid network knowledge and practical experience. Sometimes practical experience determines the time for troubleshooting.
C) You need to have a wealth of network application knowledge when analyzing server conditions. I have ruled out such a fault and finally located the database parameter settings on the server! Measure the test taker's knowledge about the hardware performance and configuration of the server, system performance and configuration, network applications, and the impact on the server.

3. tool-Type Analysis Methods: there are various powerful testing tools and software. Their automatic analysis and expert systems can quickly provide various network parameters or even fault analysis results, this is effective in solving 60% of common network faults.

4. comprehensive and Empirical Analysis Methods: By accumulating time, errors, and successes, this method is used by most network test engineers, combined with network management and testing tools, you can quickly locate network faults.

In the analysis of this case, we should first determine whether the problem is common, and then test whether the problem occurs again on the network link. Fortunately, the test process of this problem is over, otherwise, I do not know how many brains and devices I want to use for more analysis.

It is not difficult to see from this typical case that it is difficult to quickly and effectively troubleshoot network faults by simply relying on a single means and testing equipment. Only through the cooperation of a variety of testing equipment and experience, only by gradually narrowing down the scope of the fault can the Fault Cause be found and eliminated. This case has been added to the classic network maintenance case analysis of anheng network maintenance Institute.


Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.