I haven't updated my blog for a long time. A while ago, I experienced my first job hopping, which is much harder than I thought. After all, it was the first job and it was quite sad to leave. The new job has just begun, and we hope everything will go well. This blog was written during May 1, and is officially posted today. It is a good habit to write a technical blog. I want to keep it!
When I was in the previous job, I always wanted to try Flash/SSD. Unfortunately, it has not been implemented for various reasons. SSD is becoming more and more popular today. Many companies in the industry are trying to use it, even as the main storage medium. From the perspective of research, SSD is a blockbuster, because it subverts some basic assumptions about the original hard disk, theoretically, those theories and experiments based on hard disk can be re-implemented on SSD (How much paper can be created here ...).
In the past two days, I have been idle and have searched for several papers to popularize science. This is a basic concept and I hope to have a chance to try it out in the future. Here are some reading notes, which are purely on paper.
PS: The Flash/SSD mentioned below refer to NAND Flash, which is now the most common flash.
The hardware principles of SSD are not the focus of the discussion here (after all, we are software developers ). The only thing I need to mention is that SSD is not an electromechanical hardware such as hard disk, but an electronic chip ). Therefore, the seek time with the most time-consuming hard disk basically does not exist in SSD. This feature will be discussed in more detail later.
In the traditional computer storage architecture, there are several levels:
CPU Cache
Main Memory (RAM)
Hard Disk
Their Io performance decreases from top to bottom. Of course, their cost also decreases from top to bottom, but the capacity increases from top to bottom.
In this article, the CPU cache is not considered, starting with main memory and hard disk. Select latency, the most important performance indicator. The latency of main memory is usually around NS (dozens of ns are actually used ), the latency of hard disk read/write is about 10 ms. (Here, 100ns and 10 ms are not precise data, which only indicates an approximate order of magnitude. In addition, the latency of hard disk does not consider the influence of the buffer that comes with disk ).
SSD is generally considered to be a storage medium between main memory and Hard Disk:
Main Memory (RAM)
Flash/SSD
Hard Disk
Its latency is about us (this value is normal ). Comparison with main memory and Hard Disk:
Disk = 100 * flash;
Flash = 1000 * memory;
Disk = 100 * flash = 100*1000 memory;
Therefore, for flash/SSD,The most intuitive concept, which is 100 times faster than disk.
Of course, it also provides persistent storage.
With this intuitive concept, let's take a closer look at the differences between hard disk and SSD:
Read:
Random read:
As mentioned above, hard disk has a very annoying seek time, which usually takes several MS, so a random read of hard disk is dragged down to 10 ms; the SSD's random read performance is very good, around us. Therefore, SSD is 100 times faster than hard disk in the case of random read.
Sequential read:
The scenario of sequential read is to read continuous data blocks at a time, for example, to read 1 Mbit/s of data continuously. In this case, the comparison between the two is troublesome. One sequential read of Hard Disk only consumes one seek time, so its overall latency is greatly reduced. For SSD, its sequential read is the same as random read, for example, sequential read of 16 k Data is equivalent to random read of 2 K * 8 times. Compared with hard disk, the advantage is much smaller. I have a piece of data in my hand. It takes 3.98 ms for flash to read 12.85 kb pages, while hard disk only requires 100 ms. The difference is far less than times.
Write:
Writing also includes random writing and sequential writing. For a hard disk, its write behavior is similar to the read behavior. Therefore, the Read and Write latency of the hard disk is very close to that of the hard disk, which is about 10 MS +.
As for SSD, its reading and writing are very different. First, the latency of random write is twice that of random read. For example, if the latency of random read is 100us, the latency of random write is about 200us. This is mainly determined by the hardware features of SSD. Of course, although latency is nearly doubled, it is much faster than hard disk's random write.
In the case of sequential write, SSD also faces the same problem as sequential read, that is, it does not obtain additional benefits due to sequential operations, because, compared with hard disk, its advantages will be reduced;
Finally, it is also the most interesting thing,Over-write (overwrite)
. There is no difference between overwrite and normal write for hard disk, but there is no strict over-write in SSD, that is, SSD is not allowed to write data directly to a region that has already written data. To achieve the over-write effect, you must first perform an erase operation to erase the original data of the data block and then write new data. An erase operation takes about 1.5 ms. Is this all done? No, it's worse. For example, if over-write is a 2 k data, but the region (called erase unit) that an erase operation executes may be 128 K, to write this 2 k data, you need to erase K of data and then write the data and the new 2 k data back to the SSD. The entire process may cause the write operation to be slower than the execution on hard disk.
After the analysis, we can summarize,SSD is two orders of magnitude faster than hard disk in random read, and better than hard disk in sequential read/write. the most frightening thing is over-write, in this case, SSD may have worse performance than hard disk.
Next, I would like to answer an interesting question: How should I use SSD?
One simple method is to use SSD as a replacement for hard disk, discard hard disk, and use SSD as the data persistence medium. Of course, this is fine, and basically all SSD external interfaces are the same as block device-and hard disk. Therefore, there is no difference in the interface between operating SSD and operating hard disk. The only possible consideration is the high error rate of SSD. This is troublesome.
Another more complicated approach is to regard SSD as a new medium between memory and hard disk. In this case, SSD can be used as an extended buffer pool of memory or as an extended persistent storage outside disk. Which one should I choose?
The answer is provided in [1. The entire demonstration process is complicated, so I will not talk much about it. It's better to look at paper directly. In short, different systems are used differently. In file system, a file is often a large byte stream. operations on a file are usually to read the entire file into the memory. Generally, multiple consecutive blocks are allocated to disks to store one file. In this case, Flash is more suitable for memory extension. Because when all files are read into flash (equivalent to memory) for operation, once the system crashes, the files being modified in flash can be written back to disk. However, for database system, Flash is more suitable for disk expansion. For specific reasons, see paper.
[1] the five-minute rule twenty years later, and how Flash Memory changes the rules