Address: http://www.yankay.com/%E5%86%85%E5%AD%98%E7%A9%B6%E7% AB %9F%E6%9C%89%E5%A4%9A%E5%BF% AB %EF%BC%9F/
In general. The CPU needs 0 cycles to access its registers, 1-30 cycles to access the cache, and 50-cycles to access the primary memory.
For Intel Core i7. This value can be very specific. The Intel Core i7 clock speed is about 2-3 GHz. Can be calculated.
|
L1-Instruction Cache |
L1-data cache |
L2-Cache |
L3-Cache |
Memory |
Access cycle |
4 |
4 |
11 |
30-40 |
50-200 |
Cache size |
32KB |
32KB |
256KB |
8 MB |
Several GB |
Access time |
2ns |
2ns |
5ns |
14-18ns |
24-93ns |
That is to say, the access time to the memory is ns-level.
Let's take a look at the disk.
Disk access time = seek time + rotation delay + data transmission time. For general 7200 to STAT disks. The value is 9 ms + 4 ms + 0.02 ms = 13.02 ms.
That is to say, if a random access to a byte from the disk takes 13.02 ms, it is at least 0.14 million times more than the 24-93ns time obtained from the memory. Why is there a huge gap between five data levels.
Sequential disk read/write is faster. Assume that a disk has 1000 sectors, each of which is 512 bytes, and is converted to 7200 bytes. Sequential reading can ignore the seek time. Therefore, the throughput is the number of sectors x slice size x rotation speed = 1000*512/(60/7200) = 58 MB/s. This data does not seem to be the same. If you use a multi-disk system. Stat ii interface, throughput can reach 300 MB/s. To achieve ultimate performance, you can mount a bare disk and directly operate multiple disks.
Storage Hill
A concept of storage Mountain is mentioned in the book "understanding computer systems in depth. The professor cleverly draws the memory throughput into a mountain.
The stored Mountain Test Program is as follows:
Kernel_loop(elems, stride):for (i = 0; i < elems; i += stride) result = data[i];
The X axis indicates the read step, the Y axis indicates the throughput, And the Z axis indicates the total data volume.
The smaller the step size, the smaller the total data volume. Better performance.
Obviously, whether the mountains are smooth is a step. The red part is the L1 cache, the green is the L2 cache, the light blue is the L3 cache, and the dark blue is the memory. We can get some data.
|
L1-data cache |
L2-Cache |
L3-Cache |
Memory |
Disk |
SSD |
Cache size |
32KB |
256KB |
8 MB |
Dozens of GB |
Several TB |
Hundreds of GB |
Access time |
2ns |
5ns |
14-18ns |
24-93ns |
13.0 ms |
30-300us |
Throughput |
6500 MB/s |
3300 MB/s |
2200 MB/s |
800 MB/s |
60 MB/s |
250 MB/s |
That is to say, the throughput performance of the cache is only 800 Mb/s. Compared to the disk's 300 MB/s, the network's 100 MB/s. It is only several times faster. Generally, the memory is much faster than the disk. In fact, there is not so much memory. If the memory is not properly operated, frequent read/write of the memory can also become a system bottleneck.
Summary
The CPU clock speed has stopped increasing. However, cache continues to grow at the speed of Moore's Law. For a long time, the high-speed cache frequency will gradually catch up with the performance of the processor, and the capacity will increase. However, the memory is not optimistic. Although the capacity has increased a lot, the performance has not been greatly improved, and the disk status is similar. SSD has just begun to become popular and the trend is not obvious.
However, we can see that the SSD throughput and memory throughput are not large. That is to say, in the future, when SSD completely replaces the disk. We need to operate the memory as carefully as we do on disks to write programs that match the performance of computers in that era. In contrast, SSD is easier to use than disks. After all, the random read/write speed is one or two orders of magnitude.