A Free Trial That Lets You Build Big!
Start building with 50+ products and up to 12 months usage for Elastic Compute Service
A network failure to troubleshoot a Linux system.
Network troubleshooting is generally a certain way of thinking and order, in fact, the idea of troubleshooting is based on the specific problem-by-paragraph troubleshooting may occur, the final determination of the problem.
So the first thing to ask is, what is the network problem, is not through, or slow?
1. If the network is not through, to locate specific problems, is generally trying to eliminate the impossible failure of the place, and ultimately locate the root cause of the problem. General needs to see
Whether to access the link
Whether the appropriate network adapter is enabled
Whether the local network is connected
Can I route to the target host
Whether the remote port is open
2. If the network speed is slow, there are several ways to locate the source of the problem:
Whether DNS is the source of the problem
To see which nodes are bottlenecks in the routing process
To view bandwidth usage
One, the network does not pass
In general, when there is a network failure, access to the end and the information in the end is collected, the purpose is to determine the host or section of the problem. If a cannot access C and B can access C, then it is obvious that the problem is on a or a to C network, and through the same subnet, several machines A, B can access the network normally, but can not access C, then the network may be a problem with C, or C has problems.
Locating the host where the problem resides, there are generally steps to gradually narrow down the problem and ultimately locate the problem:
1. Whether the link is connected
That is, check whether the network card and networks are physically connected, cable is plugged in and the connection is available, many times not immediately to the computer room to determine the physical connection, you can use the command:
|# ethtool ethN|
EHTN is a network card that is connected to the failed
Example 1: Viewing the physical connection of a eth0 using Ethtool
1 # ethtool eth0 2 Settings for eth0: 3 Supported ports: [ TP ] 4 Supported link modes: 10baseT/Half 10baseT/Full 5 100baseT/Half 100baseT/Full 6 1000baseT/Full 7 Supported pause frame use: No 8 Supports auto-negotiation: Yes 9 Advertised link modes: 10baseT/Half 10baseT/Full 10 100baseT/Half 100baseT/Full 11 1000baseT/Full 12 Advertised pause frame use: No 13 Advertised auto-negotiation: Yes 14 Speed: 1000Mb/s 15 Duplex: Full 16 Port: Twisted Pair 17 PHYAD: 1 18 Transceiver: internal 19 Auto-negotiation: on 20 MDI-X: Unknown 21 Supports Wake-on: g 22 Wake-on: g 23 Link detected: yes
Where 14 lines show the speed of the current network card, this is a gigabit network card, 15 lines show that the current network support full duplex, 23 lines shows that the current network card and the physical connection of the networks is normal. Usually the speed and full/half duplex status is automatically negotiated between the host and the network protocol provider, such as the auto-negotiation on line 8th here. If you find that the duplex of 15 rows is set to half, you can manually change it to full-duplex network:
1 # ethtool-s eth0 Autoneg off Duplex full
2. The NIC is enabled properly
The general network physical connection failure situation is not uncommon, when troubleshooting physical connection problems, you need to further check the network card working status.
Example 2: Check the NIC eth1 status using the Ifconfig command
1 # ifconfig eth1 2 eth1 Link encap:Ethernet HWaddr e4:1f:13:b5:b0:62 3 inet addr:10.0.0.11 Bcast:10.0.0.255 Mask:255.255.255.0 4 inet6 addr: fe80::e61f:13ff:feb5:b062/64 Scope:Link 5 UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 6 RX packets:74282478 errors:0 dropped:0 overruns:0 frame:0 7 TX packets:77425890 errors:0 dropped:0 overruns:0 carrier:0 8 collisions:0 txqueuelen:1000 9 RX bytes:13948947045 (13.9 GB) TX bytes:51073249506 (51.0 GB)
Example 2 the information in line 3 shows the configuration of the network card, including IP, subnet mask, etc., here can check whether there is mismatch, if this line is displayed incorrectly, it must be that the network card is not properly configured to open.
3. Whether the gateway is set up correctly
If the network adapter has started properly, you need to confirm that the destination network interface is properly configured with the gateway, and that the connection between the host and the gateway is not problematic, and that the route command and the ping command are combined to complete this phase of troubleshooting.
Example 3 using the route command to view the kernel routing table
1 # route -n 2 Kernel IP routing table 3 Destination Gateway Genmask Flags Metric Ref Use Iface 4 0.0.0.0 184.108.40.206 0.0.0.0 UG 0 0 0 eth0 5 10.0.0.0 0.0.0.0 255.255.255.0 U 0 0 0 eth1 6 220.127.116.11 0.0.0.0 255.255.255.0 U 0 0 0 eth0
ROUTE-N displays information such as the gateway in the form of IP instead of hostname, on the one hand, it is faster, on the other hand does not involve DNS, through the route command to view the kernel route, verify that the specific network card is connected to the destination network routing, then you can try to ping the gateway, to troubleshoot the connection with the gateway.
If you cannot ping the gateway, it is possible that the gateway has restricted ICMP packets, or the switch is setting the issue.
4. DNS Work status
Often many network problems are caused by DNS failure or improper configuration, and the Nslookup and dig commands can be used to troubleshoot DNS problems.
Example 4 using the Nslookup command to view DNS resolution
1 # nslookup baidu.com 2 Server: 10.21.1.205 3 Address: 10.21.1.205#53 4 5 Non-authoritative answer: 6 Name: baidu.com 7 Address: 18.104.22.168 8 Name: baidu.com 9 Address: 22.214.171.124 10 Name: baidu.com 11 Address: 126.96.36.199
Here the DNS server 10.21.1.205 is located in the current LAN, Nslookup results show that DNS is working properly. If the nslookup command cannot resolve the target domain name here, it is most likely that the DNS is improperly configured to see if there is a configuration for the domain name server in the/etc/resolv.conf file:
Example 5 DNS configuration--/etc/resolv.conf file with immediate effect
1 # Dynamic resolv.conf(5) file for glibc resolver(3) generated by resolvconf(8) 2 # DO NOT EDIT THIS FILE BY HAND -- YOUR CHANGES WILL BE OVERWRITTEN 3 nameserver 10.21.1.205
The/etc/resolv.conf file is a temporary DNS server configuration that is temporarily in effect and you want to permanently configure the address of the DNS server through the "Dns-nameservers" in/etc/networks/interfaces (Debian-based) field to limit:
Example 6 permanently active DNS configuration--/etc/networks/interfaces file
1 auto lo 2 iface lo inet loopback 3 4 auto eth0 5 iface eth0 inet static 6 network ... 7 netmask 255.255.255.0 8 broadcast ... 9 gateway ... 10 address ... 11 dns-nameservers 10.21.1.205
If our DNS server is within a subnet and cannot ping it, the DNS server is likely to be down.
5. Whether you can route to a remote host normally
Mutual understanding network is connected by a large number of router relay, the network access is a hop in between these nodes to finally reach the destination, want to see the network connection, the most direct and most commonly used commands are ping,ping, indicating that the route is working properly, but if the ping does not pass, The traceroute command can view the full "hop" process from the current host to the target host. Both the traceroute and the ping commands use the ICMP protocol package.
Example 7. Using Traceroute to track routing status
1 # traceroute www.baidu.com 2 traceroute to www.baidu.com (188.8.131.52), 30 hops max, 60 byte packets 3 1 184.108.40.206 (220.127.116.11) 1.844 ms 1.847 ms 2.102 ms 4 2 18.104.22.168 (22.214.171.124) 0.389 ms 0.393 ms 0.542 ms 5 3 localhost (10.1.150.1) 2.556 ms 3.730 ms 3.155 ms 6 4 localhost (10.12.16.17) 1.214 ms 1.190 ms 1.196 ms 7 5 localhost (10.12.30.105) 1.533 ms 1.541 ms localhost (10.12.30.101) 1.692 ms 8 6 126.96.36.199 (188.8.131.52) 3.350 ms 2.998 ms 2.977 ms 9 7 184.108.40.206 (220.127.116.11) 4.631 ms 18.104.22.168 (22.214.171.124) 3.846 ms 126.96.36.199 (188.8.131.52) 3.808 ms 10 8 184.108.40.206 (220.127.116.11) 3.120 ms 2.844 ms 2.857 ms 11 9 18.104.22.168 (22.214.171.124) 5.957 ms 5.912 ms 4.741 ms 12 10 126.96.36.199 (188.8.131.52) 2.080 ms 2.070 ms 2.036 ms 13 11 184.108.40.206 (220.127.116.11) 35.257 ms 18.104.22.168 (22.214.171.124) 35.373 ms 126.96.36.199 (188.8.131.52) 35.244 ms 14 12 * * * 15 13 * * * 16 14 * 184.108.40.206 (220.127.116.11) 35.869 ms 18.104.22.168 (22.214.171.124) 38.279 ms 17 15 * * * 18 16 * * * 19 17 * * * 20 18 * * * 21 19 * * * 22 20 * * * 23 21 * * * 24 22 * * * 25 23 * * * 26 24 * * * 27 25 * * * 28 26 * * * 29 27 * * * 30 28 * * * 31 29 * * * 32 30 * * *
Looking at line 3rd, the first hop reached the gateway of the current subnet, and then jumped to Australia's Asia-Pacific Network Consulting Center (APNIC) and so on, traceroute can see where the network relay is interrupted or the network latency situation, "*" is because the network is not reachable or a gateway restricts the ICMP protocol packet.
6. Whether the remote host is open port
The Telnet command is a sharp weapon to check the opening of the port, or the Nmap tool,
Example 8. Using Telnet to detect port opening for a remote host
1 # telnet 126.96.36.199 80 2 Trying 188.8.131.52... 3 Connected to 184.108.40.206. 4 Escape character is ‘^]‘.
Telnet IP Port, you can see whether the specified remote host is open target port, here Baidu's front-end server open 80 port is required for Web services.
However, the function of the Telnet command is very limited, when the firewall is present, it is not good to display the results, so telnet can not connect with two possible: 1 is the port does not open, 2 is the firewall filtered connection.
For example, we try to telnet to the 22 port of the Baidu front-end server:
1 telnet 220.127.116.11 22 2 Trying 18.104.22.168... 3 telnet: Unable to connect to remote host: Connection timed out
Can not continue, but we can not determine whether the port is not open, or is blocked by the firewall, the use of NMAP tool will be more powerful:
Example 9. Using the Nmap tool to detect port opening conditions
1 # nmap -p 22 22.214.171.124 2 3 Starting Nmap 6.40 ( http://nmap.org ) at 2015-08-10 20:45 CST 4 Nmap scan report for 126.96.36.199 5 Host is up (0.040s latency). 6 PORT STATE SERVICE 7 22/tcp filtered ssh
The same server, using nmap detection, observed the 7th line, that the server is actually enabled 22 port, but the firewall filtered packets, if the port is really not enabled, then the 7th row of state will display closed, instead of filtered. Open ports whose status will be open.
As you can see, the port cannot be connected because the port is down or the firewall is filtered.
7. Native View Listening port
If you want to see whether a port is open locally, you can use the following command:
|# netstat -lnp | grep PORT|
Among them, parameters:
Example 10. To view the monitoring of a locally specified port
1 # netstat -lnp | grep :11211 2 Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name 3 tcp 0 0 10.0.0.11:11211 0.0.0.0:* LISTEN 28911/memcached 4 udp 0 0 10.0.0.11:11211 0.0.0.0:* 28911/memcached
Example 10 take the memcached service as an example, to view the current active port listening network, if Netstat cannot find the specified port, it indicates that no process is listening on the specified port.
The first column is the socket communication protocol, the 2nd and 3rd columns show the receive and send queues, the 4th column is the local address that the host listens to, reflects the network that the socket listens on, the 6th column shows the status of the current socket, and the last column shows the process that opened the port.
8. View Firewall rules
|1||# iptables -L|
command to view the current host's firewall, iptables features are not involved here, follow-up Boven detailed introduction.
Second, the network is slow to troubleshoot
Slow network troubleshooting is actually more challenging than network troubleshooting, because many of the times may be the cause of operators, DNS, etc., these failures are often not within our control, can only collect evidence to feedback or complaints.
If you do not want to be affected by DNS, the commands mentioned above can add the-n option, and the-n option prevents the attempt to resolve IP to host name, bypassing DNS.
The traceroute mentioned above not only can see the correctness of the route, but also can view the delay of each hop in the network, thus locating the network segment with the highest delay.
The Iftop command is similar to the top command to see which network connections are consuming more bandwidth
Example 11. Use the Iftop command to see the network bandwidth consumed by the connection
Here is a more complete example of a iftop command, the command according to the high and low bandwidth consumption, you can determine those bandwidth-intensive network connections,
The top row scale is the bandwidth ratio of the entire network, the 1th column below is the source IP, the 2nd column is the destination IP, the arrows indicate whether the data is being transferred, and the direction of the transmission. The last three columns are the data transfer rate between the two hosts at 2s, 10s, and 40s respectively.
The bottom TX, RX, respectively, represents the statistics of sending, receiving data, total is the amount of data transmission.
When all the troubleshooting means are still unable to find the network slow, packet loss serious reasons, often sacrificed the killer--grab the bag. The best way to grab a packet is to grab both sides of the communication at the same time, so that both the packets sent and the packets received can be checked at the same time, and tcpdump is a common tool for grasping packets.
Example 12. tcpdump example of grasping a package
1 # tcpdump 2 23:47:43.326284 IP ISeR-Server1.ntp > 188.8.131.52.9579: NTPv2, Reserved, length 440 3 23:47:43.326288 IP 184.108.40.206.27777 > ISeR-Server1.ntp: NTPv2, Reserved, length 8
Example 12 only captures the result of the capture of two lines as a signal, you can view the time of communication through the tcpdump, the address of both sides (-n option), port, the purpose of communication, the length of the packet and so on.
When you want to stop grabbing a packet, use CTRL-C to terminate the packet, and Tcpdump will return the number of packets fetched:
1 14422 packets captured 2 1127345 packets received by filter 3 1109698 packets dropped by kernel
Tcpdump has a number of common options, easy to record, tcpdump of the detailed use, here is not introduced, of course, GUI users can also use more professional analysis tools Wireshark.
1 # tcpdump -n port N // Capture only the traffic of a specific port 2 # tcpdump -n port N1 or port N2 // Capture traffic from multiple ports 3 # tcpdump -w output.pcap // Data packet dump, keep the original data packet to output.pcap #Tcpdump -C 10 -w output.pcap 5 # tcpdump -C 10 -W 5 -w output.pcap // Not only limit the upper limit of each volume, but also limit the total number of volumes 6 # tcpdump -r output.pcap // Replay the saved packet record
Brother Bird's Linux private dishes also provide some similar network troubleshooting ideas:
1. Does the NIC work, including hardware and drivers: LSPCI,DMESG
2. The IP parameter is set correctly: Ifconfig
3. Is the communication in the LAN normal: Ping
4. The routing information is normal: Route-n
5. DNS Status: Dig, nslookup
6. Routing node Status and latency: traceroute
7. Service Listening Port: NETSTAT-LNP
8. Firewall: iptables, SELinux
In short, the idea of this article is very consistent.
(GO) linux System Troubleshooting 4--network Chapter
Start building with 50+ products and up to 12 months usage for Elastic Compute Service