Poor Switch configuration degrades network performance

Source: Internet
Author: User

Symptom

Mr. Gu, a website IT manager, was an old friend of ours who met each other three years ago at the Cisco Conference and shared their experiences with some netizens. He was originally the Director of the information center in a large state-owned enterprise, responsible for network planning, design, construction, management and maintenance. For a long time there was no such message, and the free mailbox became invalid. In addition, I lost contact after I changed my work. He was wondering how he managed to contact him again, but he did not expect him to ask for a "self-built Internet ", yesterday, when he came to the online hospital for consultation due to network problems, he knew that he had resigned to his current website. Without looking at the other party's current situation, he directly went to the topic: Some problems occurred recently on the website that Mr. Gu was responsible. During the day, there is usually a Short congestion. Internet users report that when they access the online shopping mall of the shopping channel, they often click invalid. after repeated attempts, there is still no response. This phenomenon has lasted for two weeks. The website administrator ordered him to find out the cause within two days to solve the problem that the user could not click to shop. Otherwise ......

When did the fault occur? Generally, it is daytime and does not appear at night. When will the fault symptoms begin? There was no sign, and suddenly appeared and suddenly disappeared, very unstable and irregular.

So how long has it been from the first fault to today? Only two weeks.

What did you do on the Internet two weeks ago? Such as adjusting the network structure, adding or deleting network devices, adding servers, adding, deleting, and changing network users? No. However, the website content is almost changing every day, but this should not have any impact. Because we have a network management system installed, we can view the traffic status of network links at any time. Threshold alarms are also set for the link traffic. If a traffic exception occurs, the personnel on duty will immediately know. Besides, our intranets use Mbps NICs, and core switches use Gigabit Ethernet connections. However, the website exit is only 8 Mbps. When a problem occurs, the outbound traffic has never exceeded 2 Mbps. It is not as high as the access traffic when no fault occurs. Therefore, it is obvious that access is difficult due to high access traffic due to exit bottlenecks. I checked the servers in the online mall and tried to replace them with the backup servers, but they did not work. The methods used have been used, and the problem cannot be found.

Have you ever performed packet capture analysis or latency analysis? Once done, we first conduct network management monitoring on the relevant service links, and found that the link traffic is generally only about 5%. packet capture analysis found that there was a great delay in the fault, but the Ping packet was normal. At that time, when the test failed, a workstation on the website was chosen to copy a M file from the online mall server, which was fast. The expert diagnostic system of the protocol analyzer is used to analyze the captured packets. Apart from the 3000 HSRP protocol frames, no exception is found.

Diagnosis Process

After three minutes, we came to the building where the website was located with Mr. Gu. Prepare to proceed with the check.

The fault analysis shows that the main network problem is slow access to a specified server. The following are common causes: insufficient server resources, such as low interface speed, low CPU speed, insufficient memory, and too many opened application windows. Access Channel bottlenecks and limited access speed; the processing delay of devices on the channel affects the access speed of the channel. According to the internal network, the copy delay is very small and the speed is normal. It basically indicates that the internal network of the website should be normal.

To confirm whether there is a traffic bottleneck or long delay on the access channel, We will connect the network to the egress of the router and connect the network integrated protocol analyzer OptiView to the server channel of the online shopping mall. Send a high-traffic Ping packet of 50 Mbps (50%) from the router to OptiView. This method is used to check the channel capability of this channel. We can see that the maximum channel capacity is 95 Mbps (the corresponding traffic sent is increased to 95 Mbps), and the traffic frame is changed to a normal IP frame without server response. The traffic is still 50%, at this time, OptiView installed on the server link receives 50 Mbps of traffic, indicating that all the traffic sent by the network router is "safely reached" the server. The network status is "normal ". The response from the OptiView test to the router Ping packet is displayed at 12 microseconds (0.012 ms). Conclusion: The network works normally at this moment.

Because it is an unstable "Soft Fault", we need to test it when the fault occurs. Fortunately, this fault occurs every day during the day, and we are not afraid of it.

50 minutes later, the phone number from the external line reported "fault ". We quickly use the OptiView mobile network management to view the traffic status of the channel, and the display is all less than 10%. Ping the website Router from OptiView for 1200 ms. It immediately sends 50 Mbps of traffic from OptiView to the network, and the reported traffic is only 5 MB. It seems that not only 45 MB of traffic is filtered out by the channel, but also a great delay is introduced. Check the topology of the website. The figure shows that the access channel is a Mbps Ethernet link, and five switches are used in the middle to reach the server. Perform the "TraceSwitch" check on the vro in OptiView. The result shows that the path has changed! Three more switches are included in the entire path, so that the access packets to the server can be obtained through five switches. Now, eight switches are required to access the server! Track and view the three switches and find that the port status of the corresponding link is 100 Mbps. Check the Latency Response time step by step, and find that the latency of ipvms appears on the newly added first switch channel node. Due to the backup switch, try to replace the switch to shorten the fault diagnosis time. After 10 minutes, the switch is replaced and the fault disappears after the startup test.

Continue monitoring until the end of the afternoon, and no fault occurs again.

Diagnostic comments

This fault is caused by a switch problem. During daytime work, the switch will be in a state of great time delay and change the transmission path of the switch to the Protocol. From the performance of the fault and the analysis of some STP/HSRP protocols monitored by OptiView, A vswitch with poor configuration may be similar. For example, the STP or HSRP protocol can be used to monitor the port connection status and allocate a port connection based on the transmitted bandwidth, allowed or restricted protocol. This is a normal function in a high-end switch, but if the configuration is poor or the network encounters an exception and no fixed-point traffic is set, the switch also checks, computes, and reconnects the port path based on the set point conditions, or allocates traffic bandwidth.

The network configuration document is an important reference system for fault detection. Accurate document filing is a powerful auxiliary means for fast fault detection. Conversely, filing materials without configuration documents will cause a lot of trouble for fault detection. Maintenance personnel often cannot determine whether the detected parameters are normal or abnormal. An inaccurate document record filing is sometimes worse than a non-document Record Filing. It may lead the fault detection work to a situation where everything is gone. At that time, the number of headers was useless. Maintaining the nerves, patience, and physical strength of the personnel will face great challenges.

Diagnostic recommendations

Due to the time relationship, we have no time to check the changed switch. Based on past experience, it can be preliminarily concluded that the vswitch may be poorly configured but not necessarily quality problems. We hope Mr. Gu will arrange a special time to carefully check the settings of this switch. If you can find the original initial configuration document, it is much easier to refer to the check.

Postscript

One week later, Mr. Gu told us the result: some ports of the vswitch are set to the forwarding status of traffic. The switch in this setting state will re-allocate the port path when the link traffic reaches a certain value, in order to balance the load of the entire Link. You can also perform port path transfer for the configured protocol to transfer the switch port or re-allocate the port traffic when some protocols or traffic encounters exceptions as required.

This fault has been detected due to the recent installation of the Oracle application software. When the database traffic is enabled, the original port only allows some traffic to process the user access traffic of the online store. Other bandwidths are used to ensure the bandwidth needs of the newly added application traffic.

Since Mr. Gu is unfamiliar with Oracle relational databases, the application project is contracted to a system integrator. When installing the system, the system integrator changes the configuration of the switch to make the system pass the acceptance. One of Mr. Gu's employees working with the system integrator knows this, but he does not know the topological structure of the website at all, so he thinks this operation has no impact on the network, mr. Gu was not informed of the situation.

Because the website does not have a regular inspection and document filing system at ordinary times, this old man certainly does not register this situation. This makes our inspection efficiency not high, and it will be great to check the fault deflection in the end time.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.