What is the meaning of Intel's launch of the DPDK development Kit?
http://www.zhihu.com/question/27413080?sort=created
Intel DPDK-based packet processor, compared to the core network protocol stack based packet processor, where is the advantage and value?
Based on the DPDK packet processing performance, whether it will be higher than the kernel-based protocol stack, if the high will be high, the core network protocol stack bottlenecks are mainly where?
Market
Bottom line: Intel launches DPDK to sell its hardware products better.
More functions, flexibility, performance and good, who is not willing to buy yo?
The DPDK is only suitable for use under the x86 platform, which achieves a fairly high level of performance and relies entirely on the unique mechanisms within Intel's hardware (see the technical anatomy above for details). This significantly raises the price of Intel hardware products.
This should be the main purpose of Intel.
Demand
DPDK in my current area of focus (IP do, non-IP also do, the next 35 years of technical validation done, the next 350 years of concept prototypes also do), mainly for the development of new features that the kernel does not yet have. In terms of update speed, kernel updates are slow and DPDK-based network functions are updated quickly.
Writing a new network function to the kernel and incorporating it into the release Linux requires a more complex debugging and refinement process. This feature is generally required to be quite mature, reliable, and not highly complex to run in the kernel.
and DPDK for the manufacturers to provide a broader space to play, can be said to promote the new mechanism, the testing and improvement of technology
First of all, DPDK is a 2-storey thing, that is, the things that were driven to be done by the user layer, and based on the architecture to provide a variety of optimizations, generally only used to do Io, of course, also provides a lot of 3 layers of libraries, forwarding libraries, LPM of the library, etc. DPDK does not provide an open source high-performance TCP/IP protocol stack.
Not clear what the landlord said the package processor based on the kernel protocol stack, if it refers to the Linux kernel itself, the protocol stack, in fact, is mainly compatibility and universality. Of course there are some hardware implementations of the TCP offline engine, but limited by the hardware card memory limit, TCP concurrency and performance is not higher than the DPDK-based.
As for the specific performance, in fact, can be quantified, 10Gbps, 64bytes packet length, if a package processing time is greater than 67ns, then will certainly lose packets, that is, all the basic processing can only be all in the cache, long-time stable does not lose the package is difficult to do.
As for the polling mechanism of DPDK, the CPU is 100% regardless of whether there is a package, and the performance will be greatly degraded once the CPU of the thread bound by the packet is preempted by another thread.
DPDK high-performance limit is very much, configuration is basically not universal, to take full account of Numa+nuio and other architectures, once the CPU configuration is wrong, performance slag to die.
Before DPDK came out, there were a lot of similar solutions, the fundamentals are the same, the Ioengine,netmap,ntop 10g series.
However, DPDK and they do not have much performance advantages, configuration and operation than they are more complex, instability is also large, but DPDK have a big advantage that they can not compare, is DPDK support almost all Intel network card, including the latest network card. If you do not want to manually add the new Intel network card support in your driver in a few years, then choose DPDK Right.
There is another one that can be debugged with GDB.
First, the DPDK and kernel network protocol stacks are not equivalent concepts.
DPDK simply takes data from the drive, then organizes it into chunks of data for human use, running in the user state. Functionally equivalent to the Linux device-independent interface layer, under the socket, driven above. Just this part of the Linux protocol stack is in the nuclear mindset.
You say that the package processor, many times is not the Linux kernel protocol stack, but with a dedicated package handler, similar to DPDK plus layer application processing. There are usually some hardware accelerators that are more efficient to package processing. The disadvantage is that once you don't have some functionality, those accelerators are wasted. Pure software processing is very flexible, but at the cost of power and performance.
Pure DPDK performance is very high, Intel itself gives the data is, processing a packet 80时钟分 cycle. A 3.6Ghz single core dual-threaded Xeon, 64-byte packet, pure forwarding capacity of more than 90Mpps, that is, 90 million packets per second.
I wonder if you can see that the 80-week period is a very alarming number? Normally, the processor accesses the DDR3 memory for 200 cycles, and the data that the package handler needs to manipulate is sent from the PCIe device to the DDR memory and then read out by the processor, that is, usually at least 200 cycles. Why do I have to do all the processing now in the 80-week period? I checked the file below and found that the reason was that using stashing or direct cache access technology, there would be a special field for packets sent by the PCIe NIC. When the x86 PCIe controller sees this field, it automatically plugs the header into the processor's cache, which is an unordered processor to intervene. Since the Baotou is bound to be read, this equates to early prediction, and the time of the visit is greatly shortened.
If you add a Linux socket stack, such as running a pure HTTP packet bounce, then according to my measurements, it will fall to 3,000-4,000 weeks processing a package, single core dual thread at 2.4Mpps, 2.4 million packets per second, performance is 40 times times worse.
What's the high performance? The key point, DPDK did not do socket layer protocol processing, of course fast. The other, mostly using polling instead of interrupts, and avoiding the kernel mentality to the user-state copy, and binding the kernel, avoiding thread switching overhead, and avoiding the overhead of entering system calls, using giant pages, etc.
It is also critical that when the number of threads is greater than 12, using the Linux stack will encounter mutually exclusive bottlenecks, and with the performance tools, you will find that most of the time is spent on Spin_lock. One solution is to rewrite the kernel stack, such as fastsocket above GitHub, so that packets are always processed on one core, avoiding competition, and so on. The disadvantage is that you often need to change the protocol stack yourself, and the application is not compatible.
Another way is to use the virtual machine, each feature stream only in one core processing, and the virtual machine to isolate competition, the bottom with DPDK to do the forwarding, the upper layer with the virtual machine to do packet processing, so that the original Linux protocol stack is called, to be fully compatible with the application. However, this method seems to have not been made open-source, and recently is a dpdk+ Virtual Switch OvS a project.
If you only want to DPDK high-performance plus TCP/IP/UDP processing, regardless of compatibility, then can also go to buy commercial code, I looked at the supplier's website, pure forwarding performance around 500-1000 weeks about a package.
What is the meaning of Intel's launch of the DPDK development Kit?