Analysis of Linux ARP cache aging time principle

Source: Internet
Author: User
Tags postgresql
Document directory
  • 1. the system that uses keepalived for Hot Backup requires a virtual IP address. However, Which machine the virtual IP address belongs to depends on the master and backup of the hot backup group, therefore, when the host obtains the virtual IP address, it must broadcast a free arp. At first, people thought this was unnecessary because it was not necessary to do so, and the hot standby group also worked well, however, it turns out that this is necessary;
  • 2. There is an aging time for ARP cache table items. However, the Linux system does not show how to set the aging time. So how can we set this aging time?
I. Problems

As we all know, ARP is a link layer Address Resolution Protocol. It uses the IP address as the key value to query the MAC address of the host with this IP address. The details of the Protocol are not detailed. You can read RFC or textbooks. This article mainly aims to make some records and provide some ideas for the students. Specifically, I have encountered two problems:
1. the system that uses keepalived for Hot Backup requires a virtual IP address. However, Which machine the virtual IP address belongs to depends on the master and backup of the hot backup group, therefore, when the host obtains the virtual IP address, it must broadcast a free arp. At first, people thought this was unnecessary because it was not necessary to do so, and the hot standby group also worked well, however, it turns out that this is necessary; 2. ARP cache table items have an aging time, but in Linux, it does not show how to set the aging time. So how can we set this aging time? 


II . instructions before answering questions

The specification of the ARP Protocol only describes the details of Address Resolution, but does not specify how to maintain the ARP cache for the implementation of the protocol stack. The ARP cache requires an expiration time. This is necessary because the ARP cache does not maintain the ing status or perform authentication. Therefore, the Protocol itself cannot guarantee that this ing is always correct, it can only ensure that the ing is valid for a certain period of time after obtaining the ARP response. This also gives ARP spoofing an opportunity, but this is not discussed in this article.
Huawei devices such as Cisco or VRP have clear configurations to configure the ARP cache expiration time. However, such a configuration is not available in Linux systems. At least it can be said that there is no such direct configuration. Linux users know what system behavior to configure, so using sysctl tool to configure the Sys interface under procfs is a method. However, when we Google for a long time, finally, we found that when the ARP configuration was in/proc/sys/NET/IPv4/neigh/ethx, we were confused about N files in the directory, even if you query the Linux kernel's documents, you cannot clearly understand the specific meanings of these files. For a mature system like Linux, you must have a way to configure the expiration time of the ARP cache. But how do you configure it? This should start with the ARP state machine implemented in Linux.
If you have readUnderstading Linux networking InternalsIf you really have a deep understanding of this, this article is basically about nonsense, but many people have not read that book, so the content of this article is still valuable.
The implementation of the Linux protocol stack maintains a state machine for the ARP cache. Before understanding specific behaviors, first look at the following figure of the Scheme (this figure is based onUnderstading Linux networking InternalsModify the figure 26-13 in section 26th ):

In, we can see that only the reachable status of ARP cache items is available for outgoing packets. For ARP cache items in the stale status, it is actually unavailable. If someone wants to send a packet at this time, it needs to be re-parsed. For general understanding, re-parsing means re-sending the ARP request, but in fact it is not necessarily the case, linux adds an "event point" for ARP to "do not send ARP requests" and optimizes the cache maintenance measures generated by ARP. In fact, this measure is very effective. This is ARP's "validation" mechanism. That is to say, if a packet is sent from a neighbor to the local machine, it can be confirmed that the "last hop" neighbor of the packet is valid, however, why can the "last hop" neighbor be confirmed only when the package arrives at the local machine? Because Linux does not want to increase the burden on the IP layer processing, that is, it does not want to change the original semantics of the IP layer.
Linux maintains a stale State to keep a neighbor structure. When the state changes, only individual fields are modified or filled. If you follow the simple implementation, only one reachable state can be saved, and when it expires, the ARP cache table item will be deleted. Linux has only been optimized a lot, but if you have racked your brains for these optimizations, it will be a tragedy...
III. How to maintain the stale status in Linux

In the ARP state machine implemented in Linux, the most complicated is the stale State. In this state, ARP cache table items are faced with a life-and-death decision. The alternative is the local package, if the locally sent packet uses the ARP cache table entry in the stale state, the state machine is pushed to the delay state, if no one uses this neighbor after the "Garbage Collection" timer expires, it is possible to delete this table item. Are you sure you want to delete it? In this way, we can see that there are other paths to use it. The key is to look at the routing cache. Although the routing cache is a layer-3 concept, it retains the next ARP cache table item of the route, in this sense, the Linux route cache is actually a forwarding table instead of a route table.
If an external packet uses this table item, the ARP state machine of this table item will enter the delay state, in the delay state, as long as there is a "local" Confirmation (the last hop of the local receiving package comes from this neighbor), Linux still does not send ARP requests, but if there is no local confirmation, then Linux will send a real ARP request and enter the probe status. Therefore, we can see that from the stale status, all statuses exist only for an optimization measure. The ARP cache table entry in the stale status is a cache, if Linux only deletes expired ARP cache table items in the reachable state, the meaning is the same, but the implementation looks and understands much easier!
Again, it is emphasized that the reachable enters the stale state after expiration instead of directly deleting it. It is used to reserve the neighbor structure, optimize memory and CPU utilization, and in fact it is unavailable when the ARP cache table entry enters the stale state, to make it available, either a local confirmation is made before the delay status timer expires. For example, if TCP receives a packet or the delay status expires and enters the probe status, the ARP request is responded. Otherwise it will be deleted.
Iv. Linux ARP cache implementation highlights

Analyzing the source code in a blog is a kind of childhood memory, and it is no longer a waste of layout. You only need to know the key points of the timer that Linux maintains when implementing arp.
1. reachable status Timer
Start the timer whenever an ARP response arrives or another neighbor that can prove that the ARP table entry is true. The corresponding ARP cache table entry is converted to the next state at the expiration time based on the configured time.
2. Garbage collection Timer
Timed start of the timer. The specific next expiration time depends on the configured base_reachable_time. For details, see the following code:

static void neigh_periodic_timer (unsigned long arg)
{
     ...
     if (time_after (now, tbl-> last_rand + 300 * HZ)) {// The kernel reconfigures every 5 minutes
         struct neigh_parms * p;
         tbl-> last_rand = now;
         for (p = & tbl-> parms; p; p = p-> next)
             p-> reachable_time =
                 neigh_rand_reach_time (p-> base_reachable_time);
     }
     ...

      / * Cycle through all hash buckets every base_reachable_time / 2 ticks.
       * ARP entry timeouts range from 1/2 base_reachable_time to 3/2
       * base_reachable_time.
      * /
     expire = tbl-> parms.base_reachable_time >> 1;
     expire / = (tbl-> hash_mask + 1);
     if (! expire)
         expire = 1;
     // When the next time it expires is completely based on base_reachable_time);
      mod_timer (& tbl-> gc_timer, now + expire);
     ...
}

Once the timer expires, the neigh_periodic_timer callback function will be executed, which contains the following logic, that is, the omitted part of the above:

if (atomic_read (& n-> refcnt) == 1 && // n-> used may move forward because of the "local confirmation" mechanism
     (state == NUD_FAILED || time_after (now, n-> used + n-> parms-> gc_staletime))) {
     * np = n-> next;
     n-> dead = 1;
     write_unlock (& n-> lock);
     neigh_release (n);
     continue;
}

If your table item in the stale state is not deleted in time in the experiment, try to execute the following command:

ip route flush cache

Then let's take a look at the results of IP neigh ls all. Note that you don't expect to be deleted immediately because the garbage collection timer has not expired yet... however, I can assure that after a short time, the cached table item will be deleted.
V. Solution to the first problem

When keepalived is enabled for vrrp-based hot backup groups, many people think that they do not need to re-bind their MAC addresses and virtual IP addresses when entering the master state. However, this is a fundamental error, if there is no problem, it is also lucky because the ARP timeout time configured by default on each router is usually very short, but we cannot rely on this configuration. See the following figure:

If a switchover occurs, assuming that the ARP cache timeout on the vro is one hour, one-way data cannot be communicated within nearly one hour (assuming that the hosts in the group do not send data through the vro, after all, I don't know if the vro is running Linux.) The data on the vro will be continuously transferred to the original master, however, the original matser no longer holds the virtual IP address.
Therefore, in order to make the data behavior no longer dependent on the vro configuration, you must manually bind the virtual IP address and your MAC address when switching to the master node under vrrp protocol. in Linux, the convenient arping is:

arping -i ethX -S 1.1.1.1 -B -c 1

In this way, the master host with the IP address 1.1.1.1 broadcasts ARP requests with the IP address 255.255.255.255 to the entire network. Assume that the router runs Linux, after the router receives the ARP request, it will update its local ARP cache table items (if any) based on the source IP address. However, the problem is that the update Result Status of this table item is stale, this is only the ARP specification, which is embodied in the Code. At the end of the arp_process function:








if (arp->ar_op != htons(ARPOP_REPLY) || skb->pkt_type != PACKET_HOST)
    state = NUD_STALE;
neigh_update(n, sha, state, override ? NEIGH_UPDATE_F_OVERRIDE : 0);

It can be seen that only when the next hop of an external packet is 1.1.1.1, the corresponding MAC address is mapped to the reachable status through the "local confirmation" mechanism or the actual ARP request.

Correction: after reading the source code of keepalived, we found that this worry is redundant. After all, keepalived is very mature and should not make "such a low-level error ", keepalived automatically sends free ARP after a host switches to the master node. The code in keepalived is as follows:









vrrp_send_update(vrrp_rt * vrrp, ip_address * ipaddress, int idx)
{
	char *msg;
	char addr_str[41];

	if (!IP_IS6(ipaddress)) {
		msg = "gratuitous ARPs";
		inet_ntop(AF_INET, &ipaddress->u.sin.sin_addr, addr_str, 41);
		send_gratuitous_arp(ipaddress);
	} else {
		msg = "Unsolicited Neighbour Adverts";
		inet_ntop(AF_INET6, &ipaddress->u.sin6_addr, addr_str, 41);
		ndisc_send_unsolicited_na(ipaddress);
	}

	if (0 == idx && debug & 32) {
		log_message(LOG_INFO, "VRRP_Instance(%s) Sending %s on %s for %s",
			    vrrp->iname, msg, IF_NAME(ipaddress->ifp), addr_str);
	}
}
 
 
VI. Solution to the second problem

  
How can I set the aging time of ARP cache on Linux?

We can see that the/proc/sys/NET/IPv4/neigh/ethx directory contains multiple files. Which one is the ARP cache aging time? In fact, it is the file base_reachable_time. Others are just measures to optimize the behavior. For example, the gc_stale_time file records the survival time of the "ARP cache table item cache". This time is only the cache survival time. During this time, if you need to use this neighbor, you can directly use the data recorded in the table item as the content of the ARP request, or directly set it to reachable after obtaining the "local confirmation, instead of using route lookup, ARP lookup, ARP neighbor creation, and ARP neighbor parsing.
By default, the timeout value of the reachable state is 30 seconds. After 30 seconds, the ARP cache table entry is changed to the stale State. At this time, you can think that the table entry has expired, it is only in Linux that the table item is deleted after gc_stale_time. After the ARP cache table item is non-reachable, the garbage collector is responsible for executing the "Delete the table item after gc_stale_time" operation, the next expiration time of this timer is calculated based on base_reachable_time, specifically in neigh_periodic_timer:

if (time_after (now, tbl-> last_rand + 300 * HZ)) {
     struct neigh_parms * p;
     tbl-> last_rand = now;
     for (p = & tbl-> parms; p; p = p-> next)
         // Accounting is very important to prevent ARP parsing storm caused by "resonance behavior"
         p-> reachable_time = neigh_rand_reach_time (p-> base_reachable_time);
}
...
expire = tbl-> parms.base_reachable_time >> 1;
expire / = (tbl-> hash_mask + 1);
if (! expire)
     expire = 1;
mod_timer (& tbl-> gc_timer, now + expire); 

This is evident! As appropriate, we can understand this by reading the code comments, and all the good guys will write comments. To make the experiment clear, we design the following two scenarios:
1. Use iptables to disable local reception, thus shielding ARP local validation. Use sysctl to set base_reachable_time to 5 seconds and gc_stale_time to 5 seconds.
2. Disable the iptables prohibiting policy. use TCP to download a large file from the external network or perform persistent short connections. Use sysctl to set base_reachable_time to 5 seconds and gc_stale_time to 5 seconds.
In both scenarios, use the ping command to ping the default gateway of the local LAN, and then quickly press Ctrl-C to drop the ping. use IP neigh show all to view the ARP table items of the default gateway, however, in scenario 1, the ARP table entry changes to stale in about five seconds, and then the table entry changes to delay, probe, and reachable after Ping, in five seconds, it becomes stale again. in scenario 2, ARP table items continue to be reachable and dealy, which shows the ARP state machine in Linux. In scenario 1, when the table item becomes stale, it will not be deleted for a long time? In fact, this is because a route cache item is still in use. After you delete the route cache, the ARP table item is quickly deleted.
VII. Summary

1. in Linux, if you want to set the aging time of your ARP cache, run sysctl-W net. ipv4.neigh. ethx = Y. If you set anything else, it only affects the performance. In Linux, ARP cache aging is subject to the change to stale status, rather than the deletion of its table items, the stale status only caches the cache;
2. Always remember to broadcast free ARP as quickly as possible when you replace an IP address with another local network segment device. You can use arping on Linux for tips.

Alibaba Cloud Hot Products

Elastic Compute Service (ECS) Dedicated Host (DDH) ApsaraDB RDS for MySQL (RDS) ApsaraDB for PolarDB(PolarDB) AnalyticDB for PostgreSQL (ADB for PG)
AnalyticDB for MySQL(ADB for MySQL) Data Transmission Service (DTS) Server Load Balancer (SLB) Global Accelerator (GA) Cloud Enterprise Network (CEN)
Object Storage Service (OSS) Content Delivery Network (CDN) Short Message Service (SMS) Container Service for Kubernetes (ACK) Data Lake Analytics (DLA)

ApsaraDB for Redis (Redis)

ApsaraDB for MongoDB (MongoDB) NAT Gateway VPN Gateway Cloud Firewall
Anti-DDoS Web Application Firewall (WAF) Log Service DataWorks MaxCompute
Elastic MapReduce (EMR) Elasticsearch

Alibaba Cloud Free Trail

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.