Measure the test taker's knowledge about ARP network troubleshooting techniques.

Source: Internet
Author: User
Tags network troubleshooting

I also found this problem, so I found this arp, which often caused the NIC to fail to access the network.
Recently, my company encountered a very strange problem: a P4-branded computer with a built-in intel Nic has been using very well, and the communication between the Internet and the Intranet is normal. Suddenly one day, I found that the computer was disconnected when browsing the Internet. When I ping the IP address on the internet, I also disconnected it. But there was no problem when I pinged the Intranet, communication with the Intranet is also very normal, that is, this phenomenon is very confusing when communicating with the Internet. The IP address of this computer is 192.168.24.55, And the IP address of the firewall is 192.168.24.7.

Fault Analysis: check physical links

All the computers in my organization accessing the Internet are connected through the Netscreen NS25 firewall. If it is a firewall problem, other computers access the Internet normally, without timely interruption. According to the ping of the computer, the problem seems to be on the next layer, and the intermittent disconnection seems to be a typical physical layer problem. First, check the link.

The computer is connected to a port of a Cisco layer-3 switch, and the firewall is also connected to this layer-3 switch. The routing is enabled on the layer-3 switch, and the configuration is certainly correct. First, check the network cable from the computer to the switch. If there is a problem with the network cable, the communication between the computer and the Intranet should also be faulty. the test on the network cable confirmed that there was no problem. The jumper from the firewall to the switch should be okay, because there are no problems with other computers. It can be determined that there is no problem with the link. Is there a problem with the Nic? Certainly not, because it is normal to communicate with the Intranet, so the network card is certainly no problem. Then you can eliminate physical layer problems.

Fault Analysis: Simulate Data Communication

Looking at the network layer, this computer can access the Internet, but there are packet loss. It seems that the network layer should not be faulty, so all problems seem to be concentrated at the data link layer. What is the problem with the data link layer? After thinking for a few days, I had no clue. Finally, I had to think carefully about the network communication process to see if I could find the problem.

Assume that this computer has a data packet to be sent to the Internet. First, it checks whether the destination address and local address are in the same network. If not, the packet is sent to the default gateway. In this case, the destination IP address is an Internet address and is definitely not in the same network. Therefore, data packets are sent to the default gateway. The default gateway is the Cisco L3 switch with the IP address 192.168.24.10. At this time, 192.168.24.55 the computer will check the local ARP table and find the MAC address corresponding to 192.168.24.10. If no corresponding ARP table entry is found in the ARP table, it will send an ARP request packet, and send it to all devices in the network to obtain the MAC address of 192.168.24.10. Since arp request packets are sent in broadcast mode, all devices in the network will receive the packets and send them to the network layer for verification.

When the Cisco layer-3 Switch receives this ARP request, it checks whether the IP address of the local machine is the same as the destination IP address in the ARP request packet. If the IP address is the same, the switch will make an ARP response, send its MAC address to the source computer, 192.168.24.55. After receiving the ARP response packet, the computer writes the IP address (192.168.24.10) and MAC address of the switch to the ARP table, and then encapsulates the MAC address of the switch as the destination MAC address into the packet, and send data packets to the vswitch. After receiving the data packet, the switch will check whether the destination IP address is in this segment. if it finds that it is not in this segment, it will find the route table to see if there are any route entries for the destination IP address, if no, the data packet is sent to the default route. In this case, the default route of the vswitch is the firewall with the IP address 192.168.24.7. The switch sends an ARP broadcast to obtain the MAC address of the firewall. After the firewall sends an ARP response, the switch encapsulates the MAC address of the firewall as the destination MAC address into the packet, and the packet is sent to the firewall. Then, the firewall repeats the above process, send data packets to the destination address on the Internet. All these processes are normal and there is no problem. The corresponding ARP records can be found in the ARP table of the computer and switch. It is also normal to use the tracert command to track routes. Where is the problem? It seems that you have to continue the analysis.

Fault Analysis: filtering ARP tables

After the packet arrives at the destination address on the Internet, the response packet must be returned to this computer. Then, it should also repeat the previous process. The returned packet first arrives at the firewall. In the ARP table of the firewall, find the MAC address corresponding to the destination IP address. If not, an ARP request is sent to obtain the MAC address of the destination computer, write the IP address and MAC address of the computer into the ARP table of the firewall, encapsulate it, and send it to the computer. All these seem to be normal, but why is there a temporary interruption? Judging from the fact that this computer is normal on the Intranet, there should be no problem on the layer-3 switch, but the problem only occurs when accessing the internet. Finally, it is decided to start checking on the firewall.

Telnet to the firewall and check the firewall configuration. Check the port and route table. Wondering where to start. Suddenly, in order to prevent Intranet users from stealing IP addresses to access the Internet, they bound the IP address and MAC address on the firewall! Check the ARP table. Therefore, enter the get arp command to display a large string of ARP table information, which is all static binding information of IP addresses and MAC addresses. There is only one dynamic one, it is the IP address of the next hop of the firewall and the MAC address of the next hop, that is, there is no ARP table entry of 192.168.24.55. Is it a problem with the ARP table? There seems to be a glimmer of hope!

So I decided to clear several static ARP table entries. I first run the unset arp command to clear six static ARP table entries, and then ping the Internet address on that computer, so no packet loss will occur !? Is the problem that has plagued me for a few days solved in this way? I couldn't believe it. I asked my colleagues to test it on this computer, log on to QQ, browse the webpage, and send and receive emails ...... Everything is normal, and there is no such thing as a temporary disconnection! Telnet to the firewall and run the get arp command to check that the ARP table entries of the Computer 192.168.24.55 are displayed. It seems that the problem has been solved! Sit down and think about the reason.

Fault Tracing

The Netscreen NS 25 firewall supports a maximum of 128 ARP table entries. If static binding is not performed, ARP table entries are constantly updated and deleted automatically when the timeout occurs, therefore, no ARP table entry is full. If it is a static binding, it will never be cleared, and it will always occupy an ARP table item, leaving less and less space for dynamic ARP table items until all the items are full, the situation I encountered. In this case, some may ask why other computers are disconnected when they are full? So I counted the number of ARP table items, and the number of static bindings reached 127. The next hop address to the firewall is occupied. Note that this is dynamic, when the update time is reached, it is deleted, and the computer occupies this table, so the network is connected, because other computers are constantly accessing the Internet, as a result, the ARP table entry of 192.168.24.55 will be immediately occupied by the next hop address of the firewall as soon as it reaches the update time, and the network will be disconnected. In fact, all the machines in my unit will be disconnected when accessing the Internet, but the next hop address of the firewall takes a long time to use the ARP table, the time when the Internet was interrupted was not noticed within the tolerable range. Because the next hop address of the firewall takes a long time to write ARP table entries, ARP table entries of 192.168.24.55 do not enter the ARP table, resulting in timeout. Therefore, it takes a long time to get stuck, it is time-breaking.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.