Openstack Neutron DVR Distributed routing

Source: Internet
Author: User
Tags iptables

1. Background

No DVR scenarios are used:


It is clear from the diagram that the east-west and north-south traffic will be centralized to the network nodes, which will make the network nodes a bottleneck.

If you enable DVR, the following figure:


For east-west traffic, traffic is passed directly between compute nodes.

For the north-South flow, if there is floating IP, traffic directly to the compute node. If there is no floating IP, the network node is gone.


2. Deployment and Flow direction





2.1 Things to Flow


VM1 (10.0.1.5 Net1) ping VM2 (10.0.2.5 Net2)

1) VM1 (10.0.1.5)-> qr (10.0.1.1)

VM1 sends ARP (broadcast) requests to the address of the QR gateway according to the default route, after requesting the gateway address, the ICMP message goes to the QR port.

(about the message format, when VM1 ping VM2, message source/Destination IP is always the same, message source/Purpose Mac will vary according to the different sections.) )

At the same time, Br-tun Network Bridge will discard the destination address is the interface_distributed interface of the ARP broadcast, not to allow unnecessary traffic flow outside:

# Ovs-ofctl Dump-flows Br-tun
Nxst_flow reply (xid=0x4):
...
cookie=0x0, Duration=64720.432s, table=1, n_packets=4, n_bytes=168, idle_age=64607, PRIORITY=3,ARP,DL_VLAN=1,ARP_TPA =10.0.1.1 Actions=drop
...

    2 qr (10.0.1.1)-> qr (10.0.2.1)

After entering Qrouter namespace, take advantage of the advanced routing capabilities of the Linux kernel to view routing rules.


# IP netns exec qrouter-0fbb351e-a65b-4790-a409-8fb219ce16aa IP rule
0:from All lookup Local
32766:from All lookup Main
32767:from All Lookup Default
32768:from 10.0.1.5 Lookup 16
32769:from 10.0.2.3 Lookup 16
167772417:from 10.0.1.1/24 Lookup 167772417
167772417:from 10.0.1.1/24 Lookup 167772417
167772673:from 10.0.2.1/24 Lookup 167772673



View Main Table First:

# IP netns exec qrouter-0fbb351e-a65b-4790-a409-8fb219ce16aa IP route List table main
10.0.1.0/24 Dev qr-ddbdc784-d7 proto kernel scope link src 10.0.1.1
10.0.2.0/24 Dev qr-001d0ed9-01 proto kernel scope link src 10.0.2.1
169.254.31.28/31 Dev rfp-0fbb351e-a proto kernel scope link src 169.254.31.28

The above route is satisfied in the main table, so it will go out from another QR port. (Q1: Does the same subnet for different compute nodes have the same QR port IP? )

3) QR-> br-int

Then you need to query the MAC address of 10.0.2.5, Mac is set by neutron using static ARP, because neutron know all the VM information, so he can set the static ARP in advance:


# IP netns exec qrouter-0fbb351e-a65b-4790-a409-8fb219ce16aa IP nei
10.0.1.5 Dev Qr-ddbdc784-d7 lladdr fa:16:3e:da:75:6d Permanent
10.0.2.3 Dev qr-001d0ed9-01 lladdr fa:16:3e:a4:fc:98 Permanent
10.0.1.6 Dev Qr-ddbdc784-d7 lladdr fa:16:3e:9f:55:67 Permanent
10.0.2.2 Dev qr-001d0ed9-01 lladdr fa:16:3e:13:55:66 Permanent
10.0.2.5 Dev qr-001d0ed9-01 lladdr fa:16:3e:51:99:b8 Permanent
10.0.1.4 Dev Qr-ddbdc784-d7 lladdr fa:16:3e:da:e3:6e Permanent
10.0.1.7 Dev Qr-ddbdc784-d7 lladdr fa:16:3e:14:b8:ec Permanent
169.254.31.29 Dev rfp-0fbb351e-a lladdr 42:0d:9f:49:63:c6 stale



At this point, the message enters the Br-int, and the normal is forwarded according to table 0:

cookie=0x0, duration=16440.644s, table=0, n_packets=1074, n_bytes=104318, idle_age=8917, Priority=1 Actions=NORMAL

The normal action means that the OvS fdb table entries match the destination MAC address to determine which port the message will be sent to. If there is no FDB table entry record for the Mac, a flood is carried out, and the message is sent to all ports that belong to one VLAN except the port in which the message comes in. For example:

# Ovs-appctl Fdb/show Br-int
Port VLAN MAC Age
Local 0 da:91:42:cd:fb:44 18
0 52:54:00:a9:b8:b0 0
0 52:54:00:A9:B8:B1 0

So if VM2 is also on the compute node at this point, VM2 will receive the message directly, without the need to walk Br-tun (with VM2 Mac Fdb table entry Records). Otherwise, go on to Br-tun.

   4) Br-int-> Br-tun-> out compute Node 1

The message then enters the Br-tun matching flow table from the Br-int:


cookie=0x0, duration=66172.51s, table=0, n_packets=58, n_bytes=5731, idle_age=20810, hard_age=65534, Priority=1,in_ Port=3 Actions=resubmit (, 4)
cookie=0x0, duration=67599.526s, table=0, n_packets=273, n_bytes=24999, idle_age=1741, hard_age=65534, Priority=1,in_ Port=1 Actions=resubmit (, 1)
cookie=0x0, duration=64437.052s, table=0, n_packets=28, n_bytes=2980, idle_age=20799, priority=1,in_port=4 actions= Resubmit (, 4)
cookie=0x0, duration=67601.704s, table=0, n_packets=5, n_bytes=390, idle_age=65534, hard_age=65534, priority=0 actions =drop
cookie=0x0, duration=66135.811s, table=1, n_packets=140, n_bytes=13720, idle_age=65534, hard_age=65534, PRIORITY=1,DL _vlan=1,dl_src=fa:16:3e:66:13:af Actions=mod_dl_src:fa:16:3f:fe:49:e9,resubmit (, 2)
cookie=0x0, duration=64082.141s, table=1, n_packets=2, n_bytes=200, idle_age=64081, PRIORITY=1,DL_VLAN=2,DL_SRC=FA : 16:3e:69:b4:05 Actions=mod_dl_src:fa:16:3f:fe:49:e9,resubmit (, 2)
cookie=0x0, duration=66135.962s, table=1, N_packets=1, n_bytes=98, idle_age=65301, hard_age=65534, Priority=2,dl_vlan =1,dl_dst=fa:16:3e:66:13:af Actions=drop
cookie=0x0, duration=64082.297s, table=1, N_packets=0, N_bytes=0, idle_age=64082, PRIORITY=2,DL_VLAN=2,DL_DST=FA : 16:3e:69:b4:05 Actions=drop
cookie=0x0, Duration=66136.115s, table=1, n_packets=4, n_bytes=168, idle_age=65534, hard_age=65534, PRIORITY=3,ARP,DL _vlan=1,arp_tpa=10.0.1.1 Actions=drop
cookie=0x0, duration=64082.449s, table=1, n_packets=2, n_bytes=84, idle_age=63991, priority=3,arp,dl_vlan=2,arp_tpa= 10.0.2.1 Actions=drop
cookie=0x0, duration=67599.22s, table=1, n_packets=123, n_bytes=10687, idle_age=1741, hard_age=65534, priority=0 Actions=resubmit (, 2)


Match table 0, and then match table 1, which changes the source MAC address (another QR port) to the globally unique Mac bound to the compute node.

At the same time, the following two table1 will discard the target IP is the interface_distributed interface ARP and the destination Mac is interface_distributed package, to prevent the virtual machine sent to the local IP packets will not be forwarded to the network.

Then continue to query table 2,table 2 is the Vxlan table, if the broadcast package will query table 22, if it is a Unicast packet query table 20

cookie=0x0, duration=67601.554s, table=2, n_packets=176, n_bytes=16981, idle_age=20810, hard_age=65534, PRIORITY=0,DL _dst=00:00:00:00:00:00/01:00:00:00:00:00 Actions=resubmit (, 20)
cookie=0x0, duration=67601.406s, table=2, n_packets=92, n_bytes=7876, idle_age=1741, hard_age=65534, Priority=0,dl_ Dst=01:00:00:00:00:00/01:00:00:00:00:00 Actions=resubmit (, 22)

Broadcast MAC address is FF:FF:FF:FF:FF:FF, multicast MAC address begins with 01-00-5e (specifically viewable http://book.51cto.com/art/200904/120471.htm), matching rules meet CIDR.

The ICMP packet is a unicast packet, so it queries table 20, and since the L2 pop feature is turned on, in table 20 you will learn beforehand which vtep should be forwarded to:

cookie=0x0, duration=64015.308s, table=20, N_packets=0, N_bytes=0, idle_age=64015, PRIORITY=2,DL_VLAN=2,DL_DST=FA : 16:3e:51:99:b8 Actions=strip_vlan,set_tunnel:0x3eb,output:4

(Q2: How does the tunnel mouth below the community Br-tun establish contact with the physical?)

   5) into compute Node 2-> br-tun

In Br-tun, messages entered from the outside will first match the following TABLE0 tables:


cookie=0x0, duration=66293.658s, table=0, n_packets=31, n_bytes=3936, idle_age=22651, hard_age=65534, Priority=1,in_ Port=3 Actions=resubmit (, 4)
cookie=0x0, duration=69453.368s, table=0, n_packets=103, n_bytes=9360, idle_age=22651, hard_age=65534, Priority=1,in_ Port=1 Actions=resubmit (, 1)
cookie=0x0, duration=66292.808s, table=0, n_packets=20, n_bytes=1742, idle_age=3598, hard_age=65534, Priority=1,in_ Port=4 Actions=resubmit (, 4)
cookie=0x0, duration=69455.675s, table=0, n_packets=5, n_bytes=390, idle_age=65534, hard_age=65534, priority=0 actions =drop



In table 4, the corresponding VNI is changed to the local VLAN ID, and then table 9 is queried:

cookie=0x0, Duration=65937.871s, table=4, n_packets=32, n_bytes=3653, idle_age=22651, hard_age=65534, Priority=1,tun_ Id=0x3eb Actions=mod_vlan_vid:3,resubmit (, 9)
cookie=0x0, Duration=66294.732s, table=4, n_packets=19, n_bytes=2025, idle_age=3598, hard_age=65534, Priority=1,tun_ Id=0x3e9 Actions=mod_vlan_vid:2,resubmit (, 9)
cookie=0x0, Duration=69455.115s, table=4, N_packets=0, N_bytes=0, idle_age=65534, hard_age=65534, priority=0 actions= Drop

In table 9, if the source address of the package is found to be globally unique and has a MAC address bound to the compute node, it is forwarded to Br-int:

cookie=0x0, duration=69453.507s, Table=9, N_packets=0, N_bytes=0, idle_age=65534, hard_age=65534, priority=1,dl_src= Fa:16:3f:fe:49:e9 actions=output:1
cookie=0x0, duration=69453.782s, Table=9, N_packets=0, N_bytes=0, idle_age=65534, hard_age=65534, priority=1,dl_src= Fa:16:3f:72:3f:a7 actions=output:1
cookie=0x0, duration=69453.23s, Table=9, n_packets=56, n_bytes=6028, idle_age=3598, hard_age=65534, priority=0 actions =resubmit (, 10)

   6) Br-tun-> br-int

After entering the Br-int, in table 0, if it is globally unique and the MAC address bound to the COMPUTE node queries table 1, otherwise the normal forwarding;

In table 1, the flow is set beforehand, and if the destination Mac is sent to VM2, the source Mac is changed to the Net2 Gateway MAC address (the QR port) (Q3: The reason for modifying the source Mac).


cookie=0x0, duration=70039.903s, table=0, N_packets=0, N_bytes=0, idle_age=65534, hard_age=65534, priority=2,in_port= 6,dl_src=fa:16:3f:72:3f:a7 Actions=resubmit (, 1)
cookie=0x0, duration=70039.627s, table=0, N_packets=0, N_bytes=0, idle_age=65534, hard_age=65534, priority=2,in_port= 6,dl_src=fa:16:3f:fe:49:e9 Actions=resubmit (, 1)
cookie=0x0, duration=70040.053s, table=0, n_packets=166, n_bytes=15954, idle_age=4184, hard_age=65534, Priority=1 Actions=normal
cookie=0x0, duration=66458.695s, table=1, N_packets=0, N_bytes=0, idle_age=65534, hard_age=65534, priority=4,dl_vlan= 3,dl_dst=fa:16:3e:51:99:b8 Actions=strip_vlan,mod_dl_src:fa:16:3e:69:b4:05,output:12
cookie=0x0, Duration=66877.515s, table=1, N_packets=0, N_bytes=0, idle_age=65534, hard_age=65534, priority=4,dl_vlan= 2,dl_dst=fa:16:3e:14:b8:ec Actions=strip_vlan,mod_dl_src:fa:16:3e:66:13:af,output:9
cookie=0x0, duration=66877.369s, table=1, N_packets=0, N_bytes=0, idle_age=65534, hard_age=65534, Priority=2,ip,dl_ VLAN=2,NW_DST=10.0.1.0/24 Actions=strip_vlan,mod_dl_src:fa:16:3e:66:13:af,output:9
cookie=0x0, duration=66458.559s, table=1, N_packets=0, N_bytes=0, idle_age=65534, hard_age=65534, Priority=2,ip,dl_ VLAN=3,NW_DST=10.0.2.0/24 Actions=strip_vlan,mod_dl_src:fa:16:3e:69:b4:05,output:12



  7) br-int-> VM2

At this point, VM2 will receive VM1 's bag. From the process of communication can be seen, across the network of things to traffic without network nodes.


2.2 North to South flow (VM has floating IP)  

VM1 (local ip:10.0.1.5, floating ip:172.24.4.5) ping 8.8.8.8

1) VM1 (10.0.1.5)-> QR (10.0.1.1)

Consistent with the above

   2 qr (10.0.1.1)-> RFP (169.254.31.28)-> FPR (169.254.31.29)

After entering Qrouter namespace:


# IP netns exec qrouter-0fbb351e-a65b-4790-a409-8fb219ce16aa IP rule
0:from All lookup Local
32766:from All lookup Main
32767:from All Lookup Default
32768:from 10.0.1.5 Lookup 16
32769:from 10.0.2.3 Lookup 16
167772417:from 10.0.1.1/24 Lookup 167772417
167772417:from 10.0.1.1/24 Lookup 167772417
167772673:from 10.0.2.1/24 Lookup 167772673



There is no proper route in the main table:

# IP netns exec qrouter-0fbb351e-a65b-4790-a409-8fb219ce16aa IP route List table main
10.0.1.0/24 Dev qr-ddbdc784-d7 proto kernel scope link src 10.0.1.1
10.0.2.0/24 Dev qr-001d0ed9-01 proto kernel scope link src 10.0.2.1
169.254.31.28/31 Dev rfp-0fbb351e-a proto kernel scope link src 169.254.31.28

Because the package is sent from 10.0.1.5, it looks at table 16, and the package hits the route.

# IP netns exec qrouter-0fbb351e-a65b-4790-a409-8fb219ce16aa IP route List Table 16
Default via 169.254.31.29 Dev rfp-0fbb351e-a

The route is then snat through the NetFilter postrouting chain:


# IP netns exec qrouter-0fbb351e-a65b-4790-a409-8fb219ce16aa iptables-nvl-t NAT
...
Chain Neutron-l3-agent-float-snat (1 references)
Pkts bytes Target prot opt in Out source destination
0 0 SNAT All--* * 10.0.2.3 0.0.0.0/0 to:172.24.4.7
0 0 SNAT All--* * 10.0.1.5 0.0.0.0/0 to:172.24.4.5
...



Then you can see that the package will be sent via RFP-0FBB351E-A to 169.254.31.29.

Port Rfp-0fbb351e-a and Fpr-0fbb351e-a are a pair of veth pair. You can see this interface in FIP namespace:

3) FPR (169.254.31.29)-> FG (172.24.4.6)

After the FIP namespace, the route is queried and the default route to the public network is available in the main table:

# IP netns exec fip-fbd46644-c70f-4227-a414-862a00cbd1d2 IP route
Default via 172.24.4.1 Dev fg-081d537b-06
169.254.31.28/31 Dev fpr-0fbb351e-a proto kernel scope link src 169.254.31.29
172.24.4.0/24 Dev fg-081d537b-06 proto kernel scope link src 172.24.4.6
172.24.4.5 via 169.254.31.28 Dev fpr-0fbb351e-a
172.24.4.7 via 169.254.31.28 Dev fpr-0fbb351e-a

Sent via fg-081d537b-06 to Br-ex. This is the process of sending a virtual machine to a public network. (What is the flow table on the Q4:br-ex?) If there is no Br-ex, go straight br-int, what will flow table change? )

  
Extranet ping VM1 (floating ip:172.24.4.5)

1) FIP namespace

At this time FIP's namespace will do ARP proxy: (q5:arp agent role? )

# IP netns exec fip-fbd46644-c70f-4227-a414-862a00cbd1d2 sysctl net.ipv4.conf.fg-081d537b-06.proxy_arp
Net.ipv4.conf.fg-081d537b-06.proxy_arp = 1

You can see that the ARP proxy for the interface is open, and for floating IP there are the following routes:

# IP netns exec fip-fbd46644-c70f-4227-a414-862a00cbd1d2 IP route
...
172.24.4.5 via 169.254.31.28 Dev fpr-0fbb351e-a
172.24.4.7 via 169.254.31.28 Dev fpr-0fbb351e-a
...

ARP will go through the Veth pair to the IR (Inter Router) namespace to query, you can see in IR, interface rfp-0fbb351e-a configured floating IP:


# IP netns exec qrouter-0fbb351e-a65b-4790-a409-8fb219ce16aa IP addr
1:lo: <LOOPBACK,UP,LOWER_UP> MTU 65536 qdisc noqueue State UNKNOWN Group Default
Link/loopback 00:00:00:00:00:00 BRD 00:00:00:00:00:00
inet 127.0.0.1/8 Scope host Lo
Valid_lft Forever Preferred_lft Forever
INET6:: 1/128 Scope Host
Valid_lft Forever Preferred_lft Forever
2:RFP-0FBB351E-A: <BROADCAST,MULTICAST,UP,LOWER_UP> MTU 1500 Qdisc pfifo_fast State up group default Qlen 1000
Link/ether ea:5c:56:9a:36:9c BRD FF:FF:FF:FF:FF:FF
inet 169.254.31.28/31 Scope Global Rfp-0fbb351e-a
Valid_lft Forever Preferred_lft Forever
inet 172.24.4.5/32 BRD 172.24.4.5 Scope Global Rfp-0fbb351e-a
Valid_lft Forever Preferred_lft Forever
inet 172.24.4.7/32 BRD 172.24.4.7 Scope Global Rfp-0fbb351e-a
Valid_lft Forever Preferred_lft Forever
Inet6 FE80::E85C:56FF:FE9A:369C/64 Scope link
Valid_lft Forever Preferred_lft Forever
17:QR-DDBDC784-D7: <BROADCAST,UP,LOWER_UP> MTU 1500 Qdisc noqueue State UNKNOWN Group Default
Link/ether FA:16:3E:66:13:AF BRD FF:FF:FF:FF:FF:FF
inet 10.0.1.1/24 BRD 10.0.1.255 Scope Global Qr-ddbdc784-d7
Valid_lft Forever Preferred_lft Forever
Inet6 FE80::F816:3EFF:FE66:13AF/64 Scope link
Valid_lft Forever Preferred_lft Forever
19:QR-001D0ED9-01: <BROADCAST,UP,LOWER_UP> MTU 1500 Qdisc noqueue State UNKNOWN Group Default
Link/ether fa:16:3e:69:b4:05 BRD FF:FF:FF:FF:FF:FF
inet 10.0.2.1/24 BRD 10.0.2.255 Scope Global qr-001d0ed9-01
Valid_lft Forever Preferred_lft Forever
Inet6 FE80::F816:3EFF:FE69:B405/64 Scope link
Valid_lft Forever Preferred_lft Forever



So FIP's namespace will respond to this floating ip arp.

After an external solicitation of the destination address is a floating IP request, FIP forwards it to IR, with the following rules in the IR rporouting chain:


# IP netns exec qrouter-0fbb351e-a65b-4790-a409-8fb219ce16aa iptables-nvl-t NAT
...
Chain neutron-l3-agent-prerouting (1 references)
Pkts bytes Target prot opt in Out source destination
0 0 REDIRECT TCP--* * 0.0.0.0/0 169.254.169.254 TCP dpt:80 redir ports 9697
0 0 Dnat All--* * 0.0.0.0/0 172.24.4.7 to:10.0.2.3
0 0 Dnat All--* * 0.0.0.0/0 172.24.4.5 to:10.0.1.5
...


This dnat rule converts the floating IP address to an internal address and then routes the query:

# IP netns exec qrouter-0fbb351e-a65b-4790-a409-8fb219ce16aa IP route
10.0.1.0/24 Dev qr-ddbdc784-d7 proto kernel scope link src 10.0.1.1
10.0.2.0/24 Dev qr-001d0ed9-01 proto kernel scope link src 10.0.2.1
169.254.31.28/31 Dev rfp-0fbb351e-a proto kernel scope link src 169.254.31.28

The destination address is 10.0.1.0/24 network segment, so it will be forwarded out from Qr-ddbdc784-d7. Then it will be forwarded to the br-int and then to the virtual machine.


2.3 North to South traffic (VM does not have floating IP)

In the case where the virtual machine does not have floating IP, the package sent from the virtual machine will first query the route in Ir,ir:


# IP netns exec qrouter-0fbb351e-a65b-4790-a409-8fb219ce16aa IP rule
0:from All lookup Local
32766:from All lookup Main
32767:from All Lookup Default
32768:from 10.0.1.5 Lookup 16
32769:from 10.0.2.3 Lookup 16
167772417:from 10.0.1.1/24 Lookup 167772417
167772673:from 10.0.2.1/24 Lookup 167772673

Query the main table first, then query 167772417 tables. (Q7: Does not match table 16?) )

# IP netns exec qrouter-0fbb351e-a65b-4790-a409-8fb219ce16aa IP route list Table 167772417
Default via 10.0.1.6 Dev Qr-ddbdc784-d7

This table will be forwarded to 10.0.1.6, and this IP is the Router_centralized_snat interface on network node.

We can see this interface in the Snat namespace of network node.


$ sudo ip netns exec snat-0fbb351e-a65b-4790-a409-8fb219ce16aa iptables-nvl-t NAT
...
Chain Neutron-l3-agent-snat (1 references)
Pkts bytes Target prot opt in Out source destination
0 0 SNAT All--* * 10.0.1.0/24 0.0.0.0/0 to:172.24.4.4
0 0 SNAT All--* * 10.0.2.0/24 0.0.0.0/0 to:172.24.4.4
...

This is similar to the previous L3 and will snat floating IP packets into a 172.24.4.4 (DVR's gateway arm). This process is similar to the previous L3 and is no longer described.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.