Multi-function PCIe switch Six: read-write optimization based on NTB node
1. Features of application based on NTB cross-node reading and writing
NTB is often used in applications where high performance and reliability are required to enable the transmission of data across nodes. For example, as a virtual network card, cross-node data synchronization channel, these occasions are expected to give full play to the NTB PCIe-based high-speed transmission characteristics, maximize system performance.
2. Two implementation modes based on NTB cross-node reading and writing
After you implement address translation and establish a NTB channel, there are two ways to implement NTB data transfer across nodes:
Data transmission based on CPU
Data transmission based on NTBDMA
The former relies on the CPU to move the data, which consumes CPU cycles, but is well suited for multithreaded applications, which rely on independent DMA hardware to carry data with little CPU consumption, but in multithreaded environments additional consideration is required for concurrent access to the DMA hardware. In terms of speed: Without CPU concurrency, the latter is generally much faster than the former. For example, in the author's system, with the CPU to carry data about only 100mb/s bandwidth, and DMA bandwidth close to 1000mb/s, which is not in the case of Dma/pcie to optimize the settings of the situation measured.
3. Common characteristics of two ways of realization
Whether using CPU or DMA to move data across nodes, the underlying is based on PCIE transaction implementations. Writes data from the local to the remote node, which relies on the pciepost write transaction, writes the data from the remote node to the local node, and the underlying is implemented by Pcienon-post read. Depending on the characteristics of the Pciepost transaction and the Non-post transaction, the post operation is generally faster than the non-post operation. In the author's system test data should be proven this theory: the CPU to the other node to write than the CPU from the other node read faster, DMA to the other node to write more than DMA from the other node read faster.
4. Summary
Regardless of the way the cross-node transmission is implemented, it is necessary to see the nature of the underlying PCIE transaction transmission and processing through different PCIe applications, in order to understand the performance differences shown by different applications as a whole and to make tradeoffs and optimizations as needed.
This article is from the "Store Chef" blog, so be sure to keep this source http://xiamachao.blog.51cto.com/10580956/1882433
Multi-function PCIe Switch VI: Read and write based on NTB node