Network Administrator Common Errors

Last Update:2013-11-19 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Protocol analyzer is one of the most powerful tools in the network administrator library. It can transform hard-to-handle, time-consuming, annoying CEOs, and even have to restart all machines into short-term, easy-to-reflect issues in weekly routine status reports, saving the company a lot of time and money.
However, just like any other complex tool, it must be used properly to maximize the benefits. When using a protocol analyzer to diagnose network faults, we should try to avoid ......

Error 1 analyzer misconfigured

Correctly placing the analyzer plays a decisive role in quick fault diagnosis. Imagine that analyzer is a window placed in the network, which is like a building window. The change of the field of view depends on the window from which it is located. From the south window, we can't see the congestion on the north Expressway of the building. Tracking takes a long time to analyze analyzer that is placed in an improper network location. So how can we place the analyzer correctly? We can give an example.

Possible problems and causes are as follows:

Suppose A: A host, server A, cannot communicate with any other host. Possible causes:

1) server A is not correctly configured;

2) An error occurred while configuring the NIC for server;

3) A problem occurs in the LAN where server A is located;

4) The LAN segment of server A has an error.

Suppose B: a host, server B, and host cannot communicate with any host in remote network X; there is no fault in the LAN or other remote network hosts (this means that the problem cannot be found in the LAN segment of server B or server B ).

Possible causes:

1) Some network X configuration errors of server B;

2) the connection to the CIDR Block of the vro used by server B to connect to network X is faulty;

3) An error occurred while connecting one or more of the LAN and network X where server B is located;

4) network X is used to connect to the network segment of the vro of server B;

5) network X is faulty.

Suppose C: a host and server C. The host cannot communicate with another host in the LAN, however, communication with other hosts in the network is normal (this means that the problem cannot be found in the LAN segment of server C or server C ).

Possible causes:

1) host C is incorrectly configured;

2) The host CIDR card fails;

3) the LAN segment of host C has a problem.

Imagine D: a host, server D. The host cannot communicate with a remote host, but it can communicate with other hosts in the LAN segment where server D is located normally. The connection to the remote network or remote network itself is also normal.

Possible causes:

1) host D is incorrectly configured;

2) host d nic error;

3) the LAN segment of host D has a problem.

Some of these problems can be diagnosed or eliminated without analyzer. For example, in the third case of A, the fault can be determined by checking other hosts in the LAN where server A is located; imagine that the second and third cases in D can also be determined through this method (assuming that host D can communicate with other hosts in the LAN ).

Misconfiguration of one server or host is easily detected through detection. However, analyzer is required to diagnose other problems, such as faults in networks or CIDR blocks.

In all the above possible ideas, the analyzer may be placed at the beginning as close as possible to the most likely problematic host or where the problematic network or network segment is suspected to be as close as possible, however, if no meaningful problem is found, you have to prepare a mobile analyzer. Before the fault location is determined, everything you do is based on conjecture. In the third case of Assumption B above, server B should have analyzer in the LAN and network X, at least the analyzer should be able to be moved from one end to the other.

For example, a server suddenly stops working during a fault. People initially suspected that the site staff had misoperations on the server. In fact, the tracker indicated that the server had not responded because many hosts sent connection request information to the server, causing a server deadlock.

After several days to determine what went wrong with the server, I was told to observe the tracker, therefore, the site operator is requested to move the tracker from the LAN where the host is located (Network X in case 3 of B) to the LAN where the server is located. The result shows that the access control list is not correctly added to the vro where the server is located. This incorrect access control list filters out all the information from the network where the client host is located. If you have more doubts, you will find that you have never seen the connection request information in the root of the server's LAN. Because the two ends of the network are not viewed at the same time, the site cannot work for many days.

How do you know where the tracker works on the network? In the tracker, the frame information from the client host has all the source MAC addresses of the real client, and the target MAC address is stored in the router.

Unfortunately, the problem becomes more and more complex. It is not enough to know which network the analyzer is connected. When dividing a LAN into multiple parts, the first thing to do is to find the idle Hub port or coaxial cable connector. However, in the network switching environment, it's not just about connecting the analyzer to the idle port of the SWAp device.

Most switching devices have the ability to specify a specific port as a connector or an Image Port, except that the terms used vary with the manufacturer of the switching device. If all communications from or from a specific port can also be sent to the Image Port, all the settings will be completed as long as the analyzer is connected to the Image Port.

However, some switching devices cannot send communication between the two ports to the Image Port. For example, in a duplex environment, two hosts, as part of the monitored connection, can send messages at the same time, and the switch can also receive and transmit each frame of data to another port in the link. However, for the Image Port, a data frame must be buffered. If too many frames are processed in this way, the buffer will overflow, the data frame will be lost, and the tracking will become unreliable. Worse, we don't even know that we are tracking unreliable clues.

Some switching devices support the internal analyzer function. These switches can capture data frames of objects to be tracked. The reliability of this feature depends on the buffer capacity of the switch. In some cases, we have to select the Image Port or internal analyzer mode. However, if possible, it is best to connect one of the hosts and analyzer to the Hub and mount the Hub to the switch.

Why? This is because even if you are sure that the switch has sufficient capacity to cache all data frames, the tracking is still unreliable even if the Image Port or internal analyzer cannot lose data. For example, in standard Ethernet, an RJ45 connector with a faulty port on the switch creates an interactive session whenever the switch transmits data frames to the server. The switch interprets this as a conflict and stops working, after 16 attempts, the data frame is revoked, but the data frame is still sent to the Image Port. Therefore, the tracker discovers the data frame and displays the server response failure. Another scenario is that an out-of-specification wiring causes 1% of the data frames to be damaged. If you attach the analyzer to the Hub together with the host mentioned in the first case (any data frame can be transferred), or in the second case (a damaged data frame exists in the Network) the host is mounted to the Hub together, and the receiving switch port will undo the data frames before they are sent to the image port. The tracker has no error indication. Of course, every time you change a method, you have to take some risks to correct possible unexpected problems. If the RJ45 connector fails only because it is not fixed on the switch port, the failure may not exist if the connector is re-inserted into the Hub. At least the problem is solved.

In addition, it should be noted that for the switch device, each port in its network segment is valid. Therefore, when no problem is found in the switch port connected to the server, the Hub (or analyzer) move to host or vro exchange port.

Also, note that the Hub cannot be mounted to the duplex environment. Some analyzers work in duplex mode. These analyzers have two ethernet ports and one function module. The function module divides communication pairs into two parts and sends them to each Ethernet port, the software then combines the data received from each Ethernet port into a single trace chain. This analyzer is required if the network is a duplex environment.

Error 2 excessive Filtering

The filter function allows the protocol analyzer to ignore certain data frames to free up more captured buffer space for interested frames. If you can filter data from a higher protocol layer, such as IP addresses, port numbers, and higher-level data, analyzer rarely needs to filter data based on the source or target MAC address. However, the common problem in actual tracking is that there are too many filters.

One site experienced such a fault: the connection between the server and a specific client went wrong, and it was inexplicably disconnected. Other clients did not have any problems. Because the client and the server are in the same subnet, the only way to restore the connection between the client and the server is to restart the server.

This site has an analyzer installed. In addition, a filter is configured to capture data frames between two hosts (based on MAC addresses) due to large data volumes. No problem was found in the previous two days, but the problem occurred on the third day: the trace indicates that the server suddenly stopped sending multiple sessions and the last session. When you ping the client from the server, the tracker displays that the server has not sent any data frames. The site operator concluded that there was a problem with the TCP stack or operating system.

Therefore, another tracing request is requested, and no filter is used this time. One and a half days later captured another event: the trail clearly indicates that the server continuously sends data, but no response is received at the same time. After deeper mining, we found that the target MAC address of the server data frame suddenly changed.

Since the target MAC address no longer matches the client, the first time that the trail does not use a filter, it no longer captures the MAC address, indicating that the server has stopped working. In addition, it was found that the server received the ARP packet with the new MAC address configured for the Client IP address for no reason before the address was changed. As a result, the server upgraded the ARP cache and sent data to the wrong host.

The source MAC address of the ARP data frame is tracked down by the host that sends ARP for no reason. Somehow, the host is configured with both the static IP address and DHCP address for the client at the same time. When the host starts, it is assigned a static address, which conflicts with the server. Therefore, DHCP is called and the correct address is configured.

Based on this, we can conclude that it seems reasonable to use a filter, but in many cases the root cause of the problem often appears outside of the filter. If the tracker does not indicate the cause of the problem, the filter should be disabled or at least extended until the tracker does find the cause. Only when all the filters are disabled can the tracker still find out the cause of the problem and conclude that the network is no longer feasible.

Error 3

The captured frame is too short.

The preceding example shows that the site operator uses a filter because the data volume in the network is too large. Analyzer captures data in about three minutes, which makes it almost impossible for site operators to discover the problem and prevent it in time to find the cause of the problem. The time for the analyzer to capture data frames without filling them in the capture buffer depends on the speed of the network and the number of frames in the network.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Network Administrator Common Errors

Contact Us

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support