OVS ARP Responder–theory and practice

Source: Internet
Author: User
Tags openvswitch

Prefix

In the GRE tunnels post I ' ve explained how overlay networks is used for connectivity and tenant isolation. In the L2pop post, or Layer 2 population, I explained how OVS forwarding tables is pre-populated when instances is Broug HT up. Today I ' ll talk about another form of table pre-population–the ARP table. This feature have been introduced with this patch by Edouard Thuleau, merged during the Juno development cycle.

Arp–why do we need it?

In any environment, is it the physical data-center, your home, or a virtualization cloud, machines need to know the MAC, O R Physical network address, the of the next hop. For example, let there is the machines connected directly via a switch:

The first machine has a IP address of 10.0.0.1, and a MAC address of 0000:dead:beef,

While the second machine have an IP address of 10.0.0.2, and a MAC address of 2222:face:b00c.

I merrily log into the first machine and hits ' ping 10.0.0.2′, My Computer places 10.0.0.2 in the destination IP field of T He IP packet, then attempts-a destination MAC address in the Ethernet header, and politely bonks itself on its di Gital forehead. Messages must be forwarded out of a computer's NIC with the destination MAC address of the next hop (in this case 10.0.0,2 , as they ' re directly connected). This was so switches know where to forward the frame to, for example.

Well, at this point, the first computer have never talked to the second one, so-course it doesn ' t know its MAC address. How does you discover something so you don ' t know? You ask! In this case, you shout. 10.0.0.1 would flood, or broadcast, an ARP request Saying:what is the MAC address of 10.0.0.2? This message is received by the entire broadcast domain. 10.0.0.2 would receive this message (amongst others) and happily reply, on unicast:i am 10.0.0.2 and my MAC address is 222 2:face:b00c. The first computer would receive the ARP reply and'll then is able to fill in the destination MAC address field, and FINA lly send the ping.

Would this entire process be repeated every time the and the computers wish to talk to each other? No. Sane devices keep a local cache of ARP responses. In Linux your may view, the current cache with the ' ARP ' command.

A slightly more complex case would is the computers separated by a layer 3 hop, or a router. In this case the computers is in different subnets, for example 10.0.0.0/8 and 20.0.0.0/8. When the first computer pings the second one, the OS would notice that the destination are in a different subnet, and thus F Orward the message to the default gateway. In this case the ARP request is sent for the MAC address of the pre-configured default gateway IP address. A device only cares is about the MAC address of the next hop and not of the final destination.

The absurdity of L2pop without an ARP responder

Let there is VM 1 hosted on compute node A, and VMS 2 hosted on compute node B.

with l2pop disabled, when VM 1 sends an initial message to VM 2, compute node A won ' t know The MAC address of VM 2 and'll is forced to flood the message out all tunnels, to all compute nodes. When the reply is received, node A would learn the MAC address of VMs 2 along with the remote node and tunnel ID. This is the future floods is prevented. L2pop prevents even the initial flood by pre-populating the tables, as the Neutron service is aware of VM MAC addresses, s Cheduling, and tunnel IDs. More information is found in the dedicated L2pop post.

So, we optimized one broadcast and what's about ARPs? Compute node A is aware of the MAC address (and whereabouts) of VMs 2, but VM 1 isn ' t. Thus, when sending a initial message from VM 1 to 2, an ARP request would be a sent out. Compute Node A knows the MAC address of VM 2 but chooses to put a blindfold over its eyes and send a broadcast anyway. Well, with the ARP Responder feature This is no longer case.

The OVS ARP responder–how does it work?

a new table is inserted into the Br-tun OVS Bridge, to be used as an ARP table. Whenever a message is received by Br-tun from a local VM, it's classified into unicast, Broadcast/multicast and now ARP R Equests. ARP requests go into the ARP table, where pre-learned MAC addresses (Via L2pop, more in a minute) reside. Rows in this table is then matched against the (ARP protocol, Network, IP of the requested VM) tuple. The resulting action is to construct a ARP reply that would contain the IP and MAC addresses of the remote VM, and would be Sent back from the port it came in the VM making the original request. If a match is not found (for example, if the VM is trying to access a physical device does managed by Neutron, thus was Nev Er learned via L2pop), the ARP table contains a final default flow, to resubmit the message to the Broadcast/multicas t table, and the message would be treated as any old broadcast.

The table is filled whenever new L2pop address changes come in. For example, when VM 3 was hosted on compute C, both compute nodes A and B get a message that A VM 3 with IP address ' x ' an D MAC Address ' Y ' is now on host C, in Network ' Z '. Thus, COMPUTE nodes A and B can now fill their respective ARP tables with VM 3 ' s IP and MAC addresses.

The interesting code is currently at:

https://github.com/openstack/neutron/blob/master/neutron/plugins/openvswitch/agent/ovs_neutron_agent.py#L484

For help on reading OVS tables, and a explanation of OVS flows and how they ' re comprised of match and action parts, pleas e See a previous post.

Blow by Blow:

Here's the action part:

Actions = (' move:nxm_of_eth_src[]->nxm_of_eth_dst[], ' –place the source MAC address of the Reques T (the requesting VM) as the new reply ' s destination MAC address

' MOD_DL_SRC:% (MAC) s, ' –put the requested MAC address of the remote VM as this message ' s source mac Addre SS

' load:0x2->nxm_of_arp_op[], ' –put an 0x2 code as the type of the ARP message. 0x2 are an ARP response.

' move:nxm_nx_arp_sha[]->nxm_nx_arp_tha[], ' –place the ARP request ' s source hardware address (MAC) as this new Message ' s ARP target/destination hardware address

' move:nxm_of_arp_spa[]->nxm_of_arp_tpa[], ' –place the ARP request ' s source PROTOCOL/IP address as the new me Ssage ' s ARP destination IP address

' Load:% (MAC) #x->nxm_nx_arp_sha[], ' –place the requested VM ' s MAC address as the source MAC address of The ARP reply

                         Load:% (IP) #x ->nxm_of_arp_spa[], '

' in_port ' % {' mac ': mac, ' IP ': IP}) –forward the message back to the P Ort it came in

Here's the match part:

             self.. (table= Constants.–add this new flow to the Arp_responder table

                                   priority= 1,–with a priority of 1 (another, default flow with the L Ower priority of 0 was added elsewhere in the code)

                                   proto=< Span class= "s" > ' ARP ' –match only on ARP messages

                                   dl_vlan=< span class= "n" >lvid–match only if the destination VLAN (the message has B Een locally vlan tagged by now) matches the VLAN id/network of the remote VM p>

                                   nw_dst=< Span class= "s" > ' %s ' % ip,–match on the IP address of the remote VM in ques tion

Actions=actions)

Example:

An ARP request comes in.

In the Ethernet frame, the source MAC address is A and the destination MAC address is FFFF:FFFF:FFFF.

The ARP header, the source IP address is 10.0.0.1, the destination IP is 10.0.0.2, the source MAC is A, and the Destina tion MAC is FFFF:FFFF:FFFF.

Please make sure this entire part makes sense before moving on.

Assuming L2pop have already learned about VM B, the hypervisor ' s ARP table would already contain an ARP entry for VM B, with IP 10.0.0.2 and MAC B.

Would this message be matched? Sure, the proto is ' ARP ', they ' re in the same network so Dl_vlan would be a correct, and NW_DST (this is slightly confus ing) would correctly match on the destination IP address of the ARP header, seeing as ARP replaces IP in the third layer du Ring ARP messages.

what would be the action? Well, we ' d expect an ARP reply. Remember this ARP replies reverse the source and destination so, the source MAC and IP inside the ARP header is the M AC and IP addresses of the machine we asked about originally, and the destination MAC address in the ARP header is th e MAC address of the machine originating the ARP request. Similarly we ' d expect that the source Mac of the Ethernet frame would is the Mac of the VM we ' re querying about, and the D Estination mac of the Ethernet frame would be the Mac of the VM originating the ARP request. If you carefully observe the explanation of the "action part above", you would see that this is indeed the case.

Thus, the source Mac of the Ethernet frame would be B, the destination Mac A. The ARP header, the source IP 10.0.0.2 and source Mac B, while the destination IP 10.0.0.1 and destination mac A. This ARP reply is forwarded back through the port which it came in on and would be received by VM A. VM A would unpack The ARP reply and find the MAC address which it queried about the source MAC address of the ARP header.

Turning it on

Assuming ML2 + OVS >= 2.1:

    • Turn on GRE or VXLAN tenant networks as you normally would
    • Enable L2pop
      • On the Neutron API node, in the Conf file you pass to the Neutron service (Plugin.ini/ml2_conf.ini):
[Ml2]mechanism_drivers = Openvswitch,l2population
      • On all compute node, in the Conf file you pass to the OVS agent (Plugin.ini/ml2_conf.ini):
[agent]l2_population = True
    • Enable the ARP Responder:on each compute node with the Conf file you pass to the OVS agent (Plugin.ini/ml2_conf.ini) :
[Agent]arp_responder = True

To summarize, you must use VXLAN or GRE tenant networks, you must enable L2pop, and finally your need to enable the Arp_res Ponder flag in the [agent] sections in the Conf file you pass to the OVS agent on each compute node.

This article is reproduced from http://assafmuller.com/2014/05/21/ovs-arp-responder-theory-and-practice/

OVS ARP Responder–theory and practice

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.