Recently, my unit encountered a very strange problem, a P4 brand computer, built-in Intel network card, has been used very well, browsing the Internet, the network communication is normal. Suddenly one day, found that this computer browsing the internet at times when the break, ping the address on the Internet, but also pass, break, but ping intranet when there is no problem, and intranet communication is also very normal, that is, and the Internet communication has this phenomenon, very confusing. The IP address of this computer is 192.168.24.55, and the IP address of the firewall is 192.168.24.7.
Check physical links
My unit all access to the Internet computer is through the NetScreen NS25 firewall to connect, if it is a firewall problem, and other computer access to the Internet is quite normal, there is no time to break the phenomenon. According to this computer ping phenomenon, it seems that the problem should be in the next three layers, while the time is broken as if the phenomenon is a typical physical layer of problems, then first start checking the link.
This computer is connected to a Cisco three-tier switch on one of the ports, the firewall is connected to the three-tier switch, on the three-tier switch enabled routing, configuration is certainly no problem. First check the computer to switch network cable, if this cable has problems, then this computer and intranet communication should also have problems, through the test of the network cable to prove that there is no problem. The firewall to the switch jumper should be no problem, because the other computers are no problem. This can be judged link is no problem, network card will have a problem? Certainly also will not, because it communicates with the intranet is normal, so the network card certainly also has no problem. Then you can eliminate the problem of the physical layer.
Analog data communication
Look at the network layer, this computer can access the Internet, but not completely, but there are lost packets, it seems that the network layer should not have problems, then all the problems seem to focus on the data link layer. Where is the problem with the data link layer? Thinking for a few days, no clue, finally had to think carefully about the process of network communication, see if you can find the problem.
Suppose this computer has a packet that needs to be sent to the Internet, then first it will check whether the destination address and the local address is in a network, if not a network, will be the data packages to the default gateway, the purpose of this case IP for the Internet address, so certainly not a network, So the packet is sent to the default gateway. The default gateway here is the Cisco three-tier switch with an IP address of 192.168.24.10. At this time 192.168.24.55 this computer will check the local ARP table, look for 192.168.24.10 of the corresponding MAC address, if the ARP table does not find the corresponding ARP entry, it will send an ARP request packet, it sent to the network of all devices to obtain 192.168.24.10 m AC address. Because the ARP Request packet is sent in a broadcast manner, all devices in the network receive the packet and then pass it to the network layer test.
When the Cisco three-tier switch receives this ARP request, it checks that the IP address of the computer and the destination IP address in the ARP request package are the same, and if the same, the switch will make an ARP response, sending its MAC address to the source, which is 192.168.24.55 this computer. When this computer receives the ARP reply package, it writes the IP address 192.168.24.10 and MAC address of the switch to the ARP table, encapsulates the MAC address of the switch as the destination MAC address into the packet, and sends the packet to the switch. After receiving the packet, the switch checks whether the destination IP is in this segment, finds that it is not in the network segment, and finds the routing table to see if there are any routing entries to the destination IP, and if not, the data packages to the default route, The default route for this switch in this case is the firewall with IP 192.168.24.7. So the switch sends an ARP broadcast to get the MAC address of the firewall. After the firewall makes an ARP reply, the switch encapsulates the MAC address of the firewall as the destination MAC address into the packet, the packet is sent to the firewall, and then the firewall repeats the process, sending the data packages to the destination address on the Internet. All these processes are normal and there is no problem. In the computer and switch ARP table can find the corresponding ARP records, with the tracert command to track the route is also normal, the problem is where? It seems that we have to continue the analysis.
Filter ARP Table
Once the packet has reached the destination address on the Internet, the response packet is returned to the computer, and it should repeat the previous procedure. Returns the packet to the firewall first, the ARP table in the firewall to find the destination IP address of the corresponding MAC address, if not, will send ARP request, get the purpose of the computer's MAC address, the computer's IP address and MAC address written to the firewall's ARP table, encapsulated and sent to this computer. All this seems to be normal, but why is there a time when the phenomenon of broken? By this computer in the network are normal phenomenon to judge, on the three-tier switch should be no problem, only when the Internet access problems, and finally decided to start from the firewall inspection.
Telnet on the firewall, check the firewall configuration, all normal; Check the port, everything is OK; Check the routing table, and it's all right. In doubt, it seems that I do not know where to begin. Suddenly, think in order to prevent intranet users to steal IP address Internet, on the firewall made IP address and MAC address binding! Yes, check the ARP table. So enter the command: Get ARP, display a large list of ARP table information, unexpectedly all is the IP address and MAC address static binding information, only a dynamic, that is the firewall's next hop IP address and the next hop MAC address information, Is that there is no 192.168.24.55 arp table entries, it is ... Problem with the ARP table? Seems to see a glimmer of hope!
So decided to first clear several static binding ARP table entries, first with the unset ARP command cleared 6 static binding ARP table entries, and then ping the Internet address on that computer, incredibly do not lose the bag!? Does it solve the problem that bothers me for a few days? I simply can not believe, and let my colleagues test on this computer, login QQ, browse the Web, send and receive mail ..., incredibly everything is normal, no original time break phenomenon! And then telnet to the firewall, get ARP a look, 192.168.24.55 that computer's ARP table items impressively in the eye. It seems that the problem has really been solved! Please sit down and think about the reason well.
Fault tracing
This netscreen NS 25 firewall supports up to 128 ARP entries, and if no static bindings are made, the ARP table entries are constantly updated, the timeout is automatically erased, so the ARP entry is not filled up. And if it is static binding, then it will never be purged, will always occupy an ARP table entries, leaving the dynamic use of the ARP table entries will be less space, until all fully occupied, causing the situation I encountered. So then, a friend will ask, since they are full, the other computer will be completely impassability, why will there be when the phenomenon of broken? So I counted the ARP entries, static binding of just 127, the remaining one to the firewall's next jump address occupied, note that this is dynamic, when its update time to be deleted, the computer is occupied by this table, so the network on the pass, Because there are other computers are constantly accessing the Internet, so the 192.168.24.55 of the ARP entry as soon as the update time will be the firewall of the next jump address occupied, when the network is not through. In fact, at this time, all of my units in the network access to the Internet will appear when the phenomenon of broken, but the firewall's next jump address occupy the ARP table entries for a long time, the internet interruption of time in everyone can endure the range, have not found it. Because the firewall's next hop address occupies an ARP table entry for a long time, 192.168.24.55 arp table entries are not entered into the ARP table, resulting in a timeout, so it does not pass the time is a bit longer, when the phenomenon of broken.