https://www.ustack.com/blog/openstack-dragonflow/
This article was organized by the May 30, 2015 OpenStack Meetup in Beijing, the speaker for Unitedstack network engineer King Absalom.
On the Vancouver OpenStack Summit, Neutron's newest subproject Dragonflow was not specifically mentioned. In fact, the project presented by the Huawei Israel technical team is still a subject of great interest to developers in the Internet sector. The Dragonflow project was submitted in 2014 and 2015, and has now become an OpenStack incubation project.
Dragonflow can be understood as the neutron 3-layer controller extension component. It implements the neutron L3 Service API and provides distributed virtual router functionality for neutron. From the design concept, Dragonflow uses a pluggable non-stateful Lightweight SDN controller, which realizes the complete distribution of the tenant sub-network (east-west) traffic, avoids the network nodes, reduces the fault domain and avoids the single point of failure.
According to Dragonflow's design philosophy, it enhances the scalability, resiliency, performance, and reliability of OpenStack Neutron L3. It can support thousands of compute nodes, keep the controller stateless for dynamic growth, no central bottleneck, and reduce compute node overhead by avoiding the use of iptables and namespaces.
DVR Distributed Virtual Routing mode
What improvements did Dragonflow make? To understand Dragonflow's design ideas, let's look at the current DVR (Distributed virtual routing) model used by OpenStack networks.
Display, DVR service architecture, L3 as service plugin integrated in neutron server, compute node start L3 agent, network node start L3 NAT agent.
is the DVR mode network I/O path diagram, we can see, for the traffic, compute node L3 Agent will start a namespace, and then do ARP answer, MAC address modification, finally the traffic directly to the opposite compute node up. If the opposite traffic wants to return, the path is the same.
What is the purpose of this design? The first is to intercept its own ARP, the ARP response back, the second is to change the MAC address through OvS to the virtual MAC address, and then send the past.
The problem with this pattern is that not only do you need to go namespace (more network equipment), but logic can also be very complex. Namespace some of the rules that need to be configured for ARP need to be configured with the tenant's compute nodes throughout the cluster, regardless of whether the compute node is actually a virtual machine with a related network, which needs to be issued.
Take a look at North-south traffic. The same approach is based on the namespace to attach the public IP, and then sent to the public network. Especially Snat, the whole path will be more complicated. Walk through the compute nodes and then go out, send to the network node, and then through the Snat namespace to do.
the advantages of DVR mode 1. The distribution of traffic is indeed resolved.
Actually the traffic offload to the switch, not to say that the cross-subnet also need to walk the network node, take the router, and then return;
2. Reduce the single point of failure, reduce the fault domain.
If the network node hangs, it will only affect one snat of traffic. If it is the traffic, or the north-south traffic with the public network IP, it will not be affected. This advantage is important because the community implementation of network node HA is not very robust, so don't put all your hopes on ha in the community. In addition, the current HA does not implement Contrack synchronization, so the user experience is poor.
the obvious disadvantage of DVR mode 1. Issues of resource and performance depletion.
Each compute node must be namespace, this will occupy the resources, and traffic all need to go namespace TCP/IP protocol stack, go through the kernel, this is more wasteful performance.
2. Complexity of management issues.
DVR is able to distribute the traffic of things, after deployment feel very good. But once the problem, if you want to debug, want to go to tcpdump, you will see all are false arp, all are fake MAC address, traffic path is very strange, it is difficult to do operation and maintenance operations. That is to say, no problem is good, a question is difficult to solve. This problem is more sensitive in the production environment. For the cloud service providers, it is impossible to guarantee never fail, the recent comparison of the fire of today's headlines, Ctrip's accident is an example. Since there is no guarantee that the 100% will not fail, then the failure of the need to quickly resolve or resume production.
3. Complexity of the code.
(1) The compute node manages so many devices, needs driver to manage these things, these all need the code;
(2) It and two-layer coupling is strong, that is, the deployment of DVR L3, it is necessary to update the DVR L2, it and L2 are tightly coupled. We have seen that previously we could only support overlay and not support VLAN networks.
What improvements have been made to Dragonflow?
Dragonflow was introduced by Huawei's Israel technical team in 2014, and the code was introduced in 2015 and has become an OpenStack incubation project.
The design of Dragonflow conforms to a standard neutron extension, implements the API through plugin, realizes the function by two agents respectively.
In the Dragonflow mode, we can see that the service is implemented the same way (service plugin). It changed the network node from the simple node (walk Snat traffic) to the neutron Controller Agent. The Neutron controller agent is equivalent to the traditional SDN scheme of network controllers, more accurately three layer of SDN controller. The L2 Agent is running on the compute node. In the original DVR mode, network nodes to run L3 Agent, now do not need. That is to say, Dragonflow changed to pure use OpenFlow to control the three-layer flow of the way.
How did Dragonflow accomplish such a change? First, it will metadata dispatch out, that is, what is DHCP, which is VLAN. The VLAN section needs to tag tag and send it to packet classifier inside. Classifier is also very simple, mainly to distinguish between ARP, broadcast and three-layer traffic. The point is that it distinguishes the three-layer traffic (default) from the other table, which is L3 's forwarding.
The forwarding table contains the Protocol's Flow table rules (Controller rules), which means that if the traffic does not hit any flow, it will be sent to the controller to do the reactive response. The flow meter is sent here to achieve openflow control of the three-layer flow.
personal point of view , the Dragonflow practice in the Sdn field is actually not particularly big new, basically is the hit traffic, and then send the packet to the controller, Controller will re-release the flow table. The benefits of it now, we do not need to pre-configure the flow table, management complexity can be significantly reduced.
From DVR and Dragonflow I think of a question: What kind of sdn do we need?
Dragonflow does seem to have done something, for example, there is no need to deploy the L3 agent on the compute nodes, and the network nodes need not have a lot of namespace to go through these additional TCP/IP stacks, seemingly to achieve performance improvements. However, the performance improvement of Dragonflow is very limited, because according to our general understanding, the key of network performance improvement is embodied in the kernel, network card aspect. As a result, it seems a little short-sighted to expect a performance boost from a controller.
There are two types of solutions for active and passive, SDN, which actively release flow meters, or passively distribute flow tables based on traffic conditions. The passive way looks beautiful, but it has its own problems. For example, when a large scale, the performance of the controller will become a bottleneck, in this case how to improve the performance of the controller? It may be thought that we can distribute the controller to different compute nodes, and then almost go back to the network of DVR.
Converged SDN (Hybrid) is another way of thinking, but there are many problems, first of all, the complexity of the code is very high. Second, the use of reactive SNAT will also have problems, Snat contains state information, simply through the two-layer control will be more troublesome. It is for this reason, Dragonflow's Snat walk is L3 mode, that is, directly to the network node, and then sent out.
Unitedstack at present in the public cloud service practice is to use pure neutron, their own ways to improve performance, the effect is good, scalability performance is also possible. Our idea now is that we want to use only the part of the DVR's two-story traffic and not let it interfere with North-south traffic. This also achieves a very good performance and effect.
Research on New project Dragonflow of OpenStack Network