As a system administrator of an IP network, you may often encounter network connection failures. In troubleshooting these failures, using package analysis software will often get twice the result with half the effort.
Many commonly used package analysis software, such as tcpdump, sniffer, windump, and ettercap. Today we will introduce several examples of how to troubleshoot network faults through the package analysis software tcpdump. For more information about tcpdump, installation, and basic usage, see this article.
Example 1: arp fault
Failure: a solaris operating system in the LAN server A-SERVER network connection is not normal, from any host can not ping the server.
Troubleshooting: First, check the system. The system itself works normally. There are no special processes running, cpu and memory usage are normal, no firewall is mounted, and the network cable is normal.
At this point we use tcpdump to locate the fault, first we will execute the ping command from the B-CLIENT host, send icmp packet to the A-SERVER, as shown below:
[Root @ redhat log] # ping A-SERVER
PING A-SERVER from B-CLIENT: 56 (84) bytes of data.
At this point, tcpdump is started in the A-SERVER to capture packets from the host B-CLIENT.
A-SERVER # tcpdump host B-CLIENT
Tcpdump: listening on hme0
16:32:32. 611251 arp who-has A-SERVER tell B-CLIENT
16:32:33. 611425 arp who-has A-SERVER tell B-CLIENT
16:32:34. 611623 arp who-has A-SERVER tell B-CLIENT
We see, did not receive the expected ICMP packet, instead captured the arp broadcast packet sent by the B-CLIENT, because the host B-CLIENT can not use arp to get the address of the server A-SERVER, so repeatedly ask the MAC address of the A-SERVER, from this point of view, the high level of the problem is unlikely, it is likely that some problems at the link layer, first look up the host A-SERVER arp table:
A-SERVER # arp-
Net to Media Table
Device IP Address Mask Flags Phys Addr
-------------------------------------------------------------
Hme0 netgate 255.255.255.255 00: 90: 6d: f2: 24: 00
Hme0 A-SERVER 2017100000000255 S 00: 03: ba: 08: b2: 83
Hme0 BASE-ADDRESS.MCAST.NET 240.0.0.0 SM 01: 00: 5e: 00: 00: 00
Please pay attention to the Flags position of the A-SERVER and we see only the S sign. We know that in the arp Implementation of solaris, the flags of arp must set the P Flag to respond to ARP requests.
Manually add p-bit
A-SERVER # arp-s A-SERVER 00: 03: ba: 08: b2: 83 pub
Now let's call arp-.
A-SERVER # arp-
Net to Media Table
Device IP Address Mask Flags Phys Addr
-------------------------------------------------------------
Hme0 netgate 255.255.255.255 00: 90: 6d: f2: 24: 00
Hme0 A-SERVER rj0000255 SP 00: 03: ba: 08: b2: 83
Hme0 BASE-ADDRESS.MCAST.NET 240.0.0.0 SM 01: 00: 5e: 00: 00: 00
We can see that the host already has the PS mark, and then test the system's network connection to restore normal, the problem is solved!
Example 2: netflow software problems
Fault description: The cisco netflow software is installed on the newly installed network management workstation to analyze the traffic of the routing equipment. After the router is configured as required, the software on the local work is installed normally and no error message is reported, however, starting netflow collector does not receive traffic information from any vro, causing the software to become invalid. Troubleshooting: Check the routes and Software Repeatedly, And the configuration is correct. When using the step-by-step analysis method, first identify the faulty device, whether the router does not send traffic information or the local system receives the error?
I suddenly thought that on the vro, we defined the client to receive data from udp port 9998. We can monitor this port to see if the vro actually sent udp data, if the system can receive data packets from the route, the routing problem may be small, and vice versa.
Use tcpdump on the network management workstation:
Nms # tcpdump port 9995
Tcpdump: listening on hme0
18:15:34. 373435 routea> nms.9995: udp 1464
18:15:34. 373829 routea.50111> nms.9995: udp 1464
18:15:34. 374100 routea.50111> nms.9995: udp 1464
Now we can see that the data packets are indeed sent from the vro. The problem is that the possibility of the vro is basically ruled out, and the system is re-checked. As a result, the firewall is installed on the network management workstation, udp port 9998 is blocked. Adjust the firewall configuration on the workstation to restore netflow to normal and troubleshoot!
Example 3: email server troubleshooting
Fault description: The local area network is equipped with a new mail server with qmail in the background. The basic functions such as sending and receiving emails on the mail server are normal, but a common strange phenomenon is found during use: it takes a long time to connect to the mail server when sending an email on a pc machine to start sending.
Troubleshooting: the network connection is normal. The performance of the email server and the following pc is normal. What is the problem? To locate the problem accurately, we send emails to the client on the PC and use tcpdump on the server to capture and analyze the client data packets, as shown below:
Server # tcpdump host client
Tcpdump: listening on hme0
19:04:30. 040578 client.1065> server. smtp: S 1087965815: 1087965815 (0) win 64240 <mss 1460, nop, wscale 0, nop, nop, timestamp [| tcp]> (DF)
19:04:30. 040613 server. smtp> client.1065: S 99285900: 99285900 (0) ack 1087965816 win 10136 <nop, nop, timestamp 20468779 0, nop, [| tcp]> (DF)
19:04:30. 040960 client.1065> server. smtp:. ack 1 win 64240 (DF)
The three handshakes are successfully completed, so far normal. Let's look down.
19:04:30. 048862 server.33152> client.113: S 99370916: 99370916 (0) win 8760 <mss 1460> (DF)
19:04:33. 411006 server.33152> client.113: S 99370916: 99370916 (0) win 8760 <mss 1460> (DF)
19:04:40. 161052 server.33152> client.113: S 99370916: 99370916 (0) win 8760 <mss 1460> (DF)
19:04:56. 061130 server.33152> client.113: R 99370917: 99370917 (0) win 8760 (DF)
19:04:56. 070108 server. smtp> client.1065: P 109 (108) ack 1 win 10136 <nop, nop, timestamp 20471382 167656> (DF)
There is a problem here. We can see that the server tried to connect to the client's 113 identd port and required authentication. However, the server did not receive a response from the client. The server tried three times again, 26 seconds later, only when the authentication request is abandoned, the packet with the reset flag is actively sent, and the data after the push starts, it is exactly 26 seconds in this process, this causes a long wait time for sending emails.
After finding the problem, you can take the right medicine. By modifying the qmail configuration on the server end, it will no longer perform port 113 authentication, and capture packets again. The mail server will no longer perform port 113 authentication attempts, instead, the data is pushed directly after three handshakes. The problem is solved!
Summary:
The above example demonstrates the role of the package analysis software in troubleshooting. Through these examples, we can easily find that the package analysis software is used properly, the system administrator can quickly and accurately locate network faults and Analyze network problems.