Openstack neutron L3 High Reliability

Source: Internet
Author: User

Http://assafmuller.com/2014/08/16/layer-3-high-availability/

Low-reliability solution for L3 agents

Currently, you can achieve Load Balancing through multiple network nodes, but this is not a highly reliable and redundant solution. Assume that you have three network nodes and create new routes. The routes are automatically planned and distributed on these three network nodes. However, if a node breaks down, all routes will not be able to provide services, and route forwarding will not work properly. Neutron does not provide any built-in solutions in the Icehouse version.


High reliability of DHCP agent

The DHCP agent is an alternative-the DHCP protocol itself supports the simultaneous use of Multiple DHCP servers in the same resource pool to provide services.

In Neutron. conf, you only need to change:

dhcp_agents_per_network = X


In this way, the DHCP scheduler starts x DHCP agents for the same network. Therefore, for the three network nodes, and set dhcp_agents_per_network = 2, each neutron network will start two DHCP agents in the three nodes. How does this work?

First, let's take a look at the implementation of the physical layer. When a host is connected to the subnet 10.0.0.0/24, a DHCP discover broadcast packet is sent. The two DHCP service processes dnsmasq1 and dnsmasq2 (or other DHCP services) receive the broadcast package and reply to 10.0.0.2. Assume that the first DHCP service responds to the server request and broadcasts the 10.0.0.2 request and specifies that the IP address is supplied by the dnsmasq1-10.0.0.253. All services receive broadcasts, but only dnsmasq1 replies to ack. Because all DHCP communication is broadcast-based, the second Dhcp Service will also receive ACK, so the 10.0.0.2 mark has been obtained by AA: BB: CC: 11: 22: 33, it is not provided to other hosts. To sum up, communication between all clients and the server is based on broadcasting. Therefore, the status (when the IP address is assigned and who it is assigned) can be correctly known by all distributed nodes.

In neutron, the relationship between MAC addresses and IP addresses is allocated before each dnsmasq service, that is, when neutron creates a port. Therefore, before DHCP request broadcast, all two dnsmasq services have been notified in the leases file that the ing relationship between AA: BB: CC: 11: 22: 33 and 10.0.0.2 should be allocated.


Back to the low-availability L3 agent of the L3 agent, there is no (at least not) any high-reliability solution that DHCP can provide, but the user does need high reliability. What should we do?
  • Pacemaker/corosync-use external cluster management technology to specify a standby network node for active nodes. The standby node is normally waiting, and so on. Once the active node fails, the L3 agent is immediately started on the standby node. The two nodes have the same host name. When the standby agent is introduced and synced with the service, its own ID will not change, so it is like managing the same router.
  • Another solution is regular synchronization (cron job ). Use the python SDK to develop a script, use the API to obtain all the old agents, obtain all the routes carried above, and re-allocate them to other agents.
  • During Juno development, check the patch provided by Kevin Benton: https://review.openstack.org/#/c/110893/, so that neutrongrong can provide new functions for allocation.

Route reallocation-long journey

The solutions listed above have actually taken time from failure to recovery. In simple application scenarios, restoring a certain number of routes to a new node is not slow. But imagine that if there are thousands of routes, it will take several hours to complete the re-allocation and configuration process. People need fast fault recovery!


Distributed Virtual Router)

Here are some documents describing how a DVR works:

  • Http://specs.openstack.org/openstack/neutron-specs/specs/juno/neutron-ovs-dvr.html
  • Https://docs.google.com/document/d/1jCmraZGirmXq5V1MtRqhjdZCbUfiwBhRkUjDXGt5QUQ/
  • Https://docs.google.com/document/d/1depasJSnGZPOnRLxEC_PYsVLcGVFXZLqP52RFTe21BE/

The key point here is to place the route to the computing node (compute nodes), which makes the L3 agent of the network node useless. Is that true?

  • DVR mainly processes floating IPs and leaves SNAT to the L3 agent of the network node.
  • It does not work with VLANs. It only supports enabling tunnes and l2pop.
  • Each computing node needs to connect to the Internet
  • L3 ha is a simplification of deployment, which is not available on the cloud platform deployed based on Havana and icehouse.
Ideally, you want to use the dvr with L3 ha. Floating IP traffic will be directly routed from your computing node, while SNAT traffic will still be forwarded from the L3 agent of your computing node HA cluster.
The L3 HA solution of the High-reliability Juno version applies the popular keepalived tool on Linux and uses vrrp internally. First, we will discuss vrrp.
Virtual router redundancy protocol (vrrp) is the first redundant protocol in the real world to provide high reliability of a default Network Gateway, or the next hop of the route is highly reliable. What problems does it solve? In a network topology, there are two routes that provide network connections. You can configure the default route of the network as the first route address and the other as the second route address.

This will provide load balancing, but what will happen if one of the routes loses connection? The idea here is a virtual IP address or a floating address, which is configured as the default network management address. When an error occurs, the Standby route will not receive the vrrp Hello Message from the master node, and this will trigger the election program. The winner will become the active network manager, and the other will still act as the standby. Active route configuration virtual IP address (VIP), internal LAN interface, when replying to ARP request will attach virtual MAC address. Since the computer in the network already has an ARP cache (VIP + Virtual Machine MAC address), there is no need to resend the ARP request. According to the election mechanism, the valid standby route becomes active, and an unnecessary ARP request is sent to declare to the network that the current VIP + Mac pair already belongs to it. This switchover includes migrating the vm mac address from the old one to the new one in the network.


To achieve this, the traffic directed to the default gateway passes through the current (new) Active route. Note that load balancing is not implemented in this solution. In this case, all traffic is forwarded from the active route. (Note: In the use cases of neutron users, load balancing is not completed at the individual routing level, but at the node level, that is, there must be a certain number of routes ). So how can we achieve load balancing at the routing layer? Vrrp group: The vrrp contains the virtual router identifier (vrid. Half of the hosts on the network are configured as the first VIP, And the other uses the second. In the case of failure, the VIP will be transferred from the failed route to another one.
Readers who are good at observing have discovered a major problem-what if an active route loses its connection to the Internet? So it will also act as an active route, but it cannot be forwarded? Vrrp increases the ability to monitor external connections and hand over the active Route status when a failure occurs.
Note: Once the address changes, two modes may be enabled:
1. Each route gets a new IP address regardless of the vrrp status. The master route configures the VIP address as an additional or second address.
2. Only VIP is configured. For example, if VIP is configured for the master route and no IP is configured on the slave. Vrrp-some facts are directly encapsulated in the IP protocol. The active instance uses the multicast address 224.0.0.18 and Mac 01-00-5e-00-00-12 to send Hello Message to the standby route. The virtual MAC address format is 00-00-5e-00-01-{vrid }, therefore, only 256 different vrids (0 to 255) use the user-configured priority in the election mechanism in one broadcast domain, from 1 to 255. The higher the priority, the higher the preemptive elections ), like other network protocols, it means that a standby is configured with a higher priority, or after a connection is interrupted and replied (previously an active instance ), it always restores to the non-preemptive election policy of the active route. When the active route returns, it can still be set to send the hello interval as the standby role (assuming every t s). If the standby route still does not receive the Hello Message from the master after 3 s, the election mechanism will be triggered.
Return to the neutron territory. l3 ha starts a keepalived instance in each Routing Space. Communication between different routing instances is based on the HA network, each of which has one tenant. This network is created under a blank (blank) tenant and cannot be operated through CLI or GUI. This ha network is also a neutron tenant network. Like all others, it also uses the default Partitioning technology. The keepalived traffic is forwarded to the HA device (specified in keepalived. conf and used by the keepalived instance in the namespace of the route ). This is the output of the namespace 'IP address' in the route:
<span style="color:#333333;">[[email protected] ~]$ sudo ip netns exec qrouter-b30064f9-414e-4c98-ab42-646197c74020 ip address1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default    ...2794: </span><span style="color:#ff0000;"><strong>ha-45249562-ec</strong></span><span style="color:#333333;">: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default    link/ether 12:34:56:78:2b:5d brd ff:ff:ff:ff:ff:ff    inet 169.254.0.2/24 brd 169.254.0.255 scope global ha-54b92d86-4f      valid_lft forever preferred_lft forever    inet6 fe80::1034:56ff:fe78:2b5d/64 scope link      valid_lft forever preferred_lft forever2795: qr-dc9d93c6-e2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default    link/ether ca:fe:de:ad:be:ef brd ff:ff:ff:ff:ff:ff    inet 10.0.0.1/24 scope global qr-0d51eced-0f      valid_lft forever preferred_lft forever    inet6 fe80::c8fe:deff:fead:beef/64 scope link      valid_lft forever preferred_lft forever2796: qg-843de7e6-8f: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default    link/ether ca:fe:de:ad:be:ef brd ff:ff:ff:ff:ff:ff    inet 19.4.4.4/24 scope global qg-75688938-8d      valid_lft forever preferred_lft forever    inet6 fe80::c8fe:deff:fead:beef/64 scope link      valid_lft forever preferred_lft forever</span>
This is the output in the master instance. On the same route of another node, there is no IP address on the HA, HR, or QG device. There are no floating IP addresses or route records. These are configured in keepalived. conf. When keepalived detects that the master instance fails, these addresses (or: VIPs) will be reconfigured by keepalived on the appropriate device. This is an instance of keepalived. conf for the same route:
vrrp_sync_group VG_1 {    group {        VR_1    }    notify_backup "/path/to/notify_backup.sh"    notify_master "/path/to/notify_master.sh"    notify_fault "/path/to/notify_fault.sh"}vrrp_instance VR_1 {    state BACKUP    interface ha-45249562-ec    virtual_router_id 1    priority 50    nopreempt    advert_int 2    track_interface {        ha-45249562-ec    }    virtual_ipaddress {        19.4.4.4/24 dev qg-843de7e6-8f    }    virtual_ipaddress_excluded {        10.0.0.1/24 dev qr-dc9d93c6-e2    }    virtual_routes {        0.0.0.0/0 via 19.4.4.1 dev qg-843de7e6-8f    }}
What about the running y scripts used by dried shrimps? These scripts are executed by keepalived, converted to master, and backed up or failed. The master script content:
#!/usr/bin/env bashneutron-ns-metadata-proxy --pid_file=/tmp/tmpp_6Lcx/tmpllLzNs/external/pids/b30064f9-414e-4c98-ab42-646197c74020/pid --metadata_proxy_socket=/tmp/tmpp_6Lcx/tmpllLzNs/metadata_proxy --router_id=b30064f9-414e-4c98-ab42-646197c74020 --state_path=/opt/openstack/neutron --metadata_port=9697 --debug --verboseecho -n master > /tmp/tmpp_6Lcx/tmpllLzNs/ha_confs/b30064f9-414e-4c98-ab42-646197c74020/state
This Master script simply opens the metadata proxy and writes the status to the file. The status file will be read by the L3 agent. The backup and error script kills the proxy service and writes the status to the state file just mentioned. This means that the metadata service function exists on the master route node.
* Do we forget the metadata agent? Start the metadata agent on each network node.
Future work and limitations
  • TCP connection trace-in the current implementation, the session of the TCP connection is interrupted after the restoration fails. One solution is to use conntrackd to copy the session status between HA routes. After the fault is restored, the TCP session will be restored to the status before the failure.
  • Where is the master node? Currently, there is no way for the Administrator to know which network node is the master instance of the HA route. The plan is provided by the agent and can be queried through APIS.
  • Edas agent-ideally, after a node is changed to the maintenance mode, it should trigger the nodes where all ha routes are located to recycle their master state to speed up recovery.
  • Notification of changes to l2pop VIP-consider routing IP/MAC in a tenant network, only the master configuration truly has an IP address, however, the same neutron port and Mac will appear on the relevant network. This may be detrimental to the l2pop driver, because it only wants one MAC address in one network. The solution is that once the vrrp status changes, an RPC message is sent from the agent, so that when the route changes to master, the control node is notified, so that the l2pop status can be changed.
  • Built-in firewalls, VPNs, and Server Load balancer as services. There is still a problem when DVR and L3 ha are integrated with these services and may be addressed in Kilo.
  • Each tenant has an ha network. This means that each tenant can have only 255 ha routes, because each route requires a vrid, and each broadcast Domain Based on vrrp protocol only allows 255 different vrid values.

Use and configure Neutron. conf
l3_ha = Truemax_l3_agents_per_router = 2min_l3_agents_per_router = 2

  • L3_ha indicates that all routes use the HA mode by default (different from the previous ). Disabled by default.
  • You can set the maximum and minimum values based on the number of network nodes. If you deploy four network nodes but set the maximum value to 2, only two L3 agents will be used for HA routing (one is master and the other is slave ).
  • Min is used to check the robustness (sanity): If you have two network nodes, one of them is broken, any new route will fail during this time, because you need to start at least min L3 agent to establish ha routing.
L3_ha controls the default switch. When the Administrator (only the Administrator) overwrites this option through CLI, a separate configuration is created for each route:
neutron router-create --ha=<True | False> router1

Reference blueprint
Spec
How to test
Code
Dedicated wiki page
Section in neutron L3 Sub Team Wiki (including overview of patch dependencies and Future Work)

Openstack neutron L3 High Reliability

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.