This blog is original, follow the cc3.0 protocol, reprint please indicate the source: http://blog.csdn.net/lux_veritas/article/details/24766015
Certificate ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Bandwidth is a memory bandwidth benchmark test program. It is mainly used for x86 and x86_64 platforms to test the memory bandwidth performance of the system through sequential read/write and random read/write of data blocks of different sizes.
Project address
Bandwidth provides a set of support libraries for implementation in assembly languages to complete specific operations related to the architecture, such as reading the content of some registers.
Use the assembler library to check the current system CPU model and supported features, and select the corresponding working mode. For example, the CPU of the machine on the author is:
CPU family: GenuineIntelCPU features: MMX SSE SSE2 SSE3 SSSE3 SSE4.1 SSE4.2 XD Intel64
When the main program is executed, select the working mode based on the CPU characteristics:
if (mode == SSE2) { print (L"(128-bit), size = "); } else if (mode == AVX) { print (L"(256-bit), size = "); } else {#ifdef __x86_64__ print (L"(64-bit), size = ");#else print (L"(32-bit), size = ");#endif }
The author's CPU supports sse2 and does not support avx. Therefore, uses the-Bit Data Bit Width for corresponding memory read/write operations.
Taking the author's machine as an example, the trial is divided into the following parts:
| ------- | 128bit | 64bit |
|: -----: |: ----: |
| Sequential read |
| Random read |
| Sequential write |
| Random write |
You can choose whether to bypass all levels of cache. The CPU cache of your machine is as follows:
Cache 0: L1 data cache, line size 64, 8-ways, 64 sets, size 32kCache 1: L1 instruction cache, line size 64, 8-ways, 64 sets, size 32kCache 2: L2 unified cache, line size 64, 16-ways, 4096 sets, size 4096k
The size of data blocks used for read and write increases from 128 B to MB. Because the cache sizes at different levels are different, smaller data blocks are stored in the cache during memory read and write operations, large data blocks are stored in the primary storage through cache. Therefore, with the addition of the data block size, the bandwidth on several nodes may change significantly, mainly because it reaches the upper limit of the cache capacity at all levels and changes to the lower storage. Bandwidth will generate a log file and a chart based on the Token test results. This bandwidth hop is the most intuitive in the chart ., There is a significant decrease in bandwidth between 32 KB and 4 MB.