How is the Android storage system optimized?
The answer is I don't know ...
Then why do you think you should write this article? I wanted to write down what I knew, mainly because one evening and a colleague discussed the optimization of the Android phone storage system. People who have used an Android phone may have this feeling that the system will become slower after the phone has been used for a long time. Slow, and one of the most important things about Android is the storage system. We now have the mainstream phone's built-in memory chips are generally EMMC, some flagship Android phones are already using UFS interface memory chip, and the iphone 6S began to use the NVMe interface memory chip. Colleagues are doing Linux kernel related work, he is thinking, is not able to optimize the kernel level of the Android storage System, improve the emmc of the bad block management, wear balance and other content.
A year of contact with SSD firmware development, no contact with eMMC. But according to my understanding, eMMC is actually a weakened version or a lite version of the SSD, the core principles and ideas are the same. So the answer I've come up with is: the things that a colleague wants to do, kernel can do nothing. First a big picture, we can see, wear balance, garbage collection, bad block management, these things have all been done by SSD, EMMC, the host side in these things can be said to be completely transparent. The idea of a colleague actually stayed a few years ago when the embedded system used bare flash chips and needed to manage the times of these miscellaneous things.
As a popular science post, this article is intended to be read by friends who do not know much about today's flash and SSD, not those of the storage community ^_^
First of all, the type of NAND flash, today's NAND flash is mainly divided into three kinds: SLC, MLC and TLC. SLC each cell stores a bit that can represent 0 or 1. MLC each cell stores two bits, which can represent 0-3. TLC stores three bits per cell, which can represent 0-7.
First of all, the simplest SLC, how it implements a bit of storage. See, four black spots are places that can be pressurized. Floating gate surrounded by a layer of insulating layer, in the initial state (that is, the erase state), floating gate without electrons, representing logic 1.
For this cell programming is to put this cell into logic 0, then need to add high pressure control gate, in the substrate ground, so that the electrons are forced to wear insulating layer into the floating gate, and has been trapped inside.
And the erase action is reverse operation, in the substrate plus high pressure, in the control gate grounding, so trapped in the floating gate of the electrons to release, restore Logic 1.
In the reading operation, the pressure on both sides of the source and drain, if there is an electron in the floating gate, will make the circuit circuit breaker, representing logic 0.
And if there is no electrons in the floating gate, it will make the circuit conduction, representing logic 1.
In this way, we can determine whether the cell is stored in 1 or 0 by the conduction of the circuit to source and drain. For example, this adds 2V of voltage, can be represented by logic 1, does not make sense of logic.
The MLC makes the action of penetrating electrons more precise and delicate, so by dividing the voltage intervals into finer four segments to represent 11, 10, 01, and 00. When reading, it tries 1V, 2V, and 3V three times, depending on whether the value stored in the cell is determined by the conduction. This is also the case with TLC, only the interval has become 8 paragraphs.
Just looked at has been a cell, here we can see that the horizontal line of the same word on the cell composed of a page, the minimum operating unit of the write operation is a page. The total number of Word line represents the number of page on this block, and the minimum operation unit of the erase operation is a block.
Having seen so many NAND flash principles, let's take a look at the actual product. Here are a few concepts that are package, target, LUN (some place called die, one meaning). Package you can understand the flash chip on the packaged PCB that we can see everyday. The package contains one or more target, each with a physical pin such as CE, data[7:0], so each target is completely parallel independent. A target can then contain one or more Lun,lun units that can be run independently, and the difference between the target is the sharing of physical pins between the LUNs.
Look at a LUN. There can be one or more plane,plane that can also run commands independently. So summing up, a flash chip, usually, the larger the capacity, target, LUN, plane more, the better performance, because these physical units can work independently in parallel, so as to maximize efficiency. This also makes us buy the cell phone, the larger the storage space, the better performance may be, in fact, the more expensive about good, haha!
There are several points to be added here, some of which have changed a lot, some of which are not in the documentation, and have relied on empirical word-of-mouth conclusions:
- We all know that NAND flash to wipe before you can write, but you know, now the number of MLC has been and we learned in the book is a different. Previously saw what SLC hundreds of thousands of times, MLC tens of thousands of times that is gone. As the manufacturing process of flash increases, the transistor becomes smaller. The advantage is that the capacity of flash is getting bigger, but one problem is that the insulating layer around the floating gate mentioned earlier is getting thinner. Every time the electrons are forced through the pressure, there will be some damage to the insulating layer. Before the insulation layer thick, so can erase tens of thousands of times, and now the MLC generally only poor (3k) times around. Some 2d TLC is 300 times more miserable. When it comes to 2d, you'll certainly guess if there's 3d. Yes, due to the process of upgrading, to a certain extent, the durability of flash to the limit, so Samsung to represent the flash manufacturers to come up with a way to deal with, that is, the circuit vertical to the upper stack, so that the total capacity can be unchanged, the process is reduced. In this way, the capacity also has, durability also has. So the number of 3d TLC wipes has returned two thousand or three thousand times.
- The number of times to erase is not to say 3000 of the mark, that is, erase 3,000 times is bad. Someone has done the experiment, can actually erase tens of thousands of times, still can work normally. What's the point of 3,000 times? Also to the insulation layer, although the insulating layer can trap most of the electrons, but in fact, there will be a small number of electrons secretly run out. Over time, for example, half a year later, the cell's electrons may have been much less, resulting in errors in the data being read. This is the retention ability of flash. Each manufacturer will set a time, 3,000 times the meaning of the erase is to ensure that within the 3,000 times, the retention ability will not fall outside the acceptable range. More than 3,000 times, the cell will not be bad, but the retention ability will fall. If you do erase it tens of thousands of times and then a block can work, it is estimated that the retention has been miserable.
So we feel that the Android system is slowing down, and the memory-related points I can think of are:
- Flash does have an aging, not necessarily a bad block, the stability of the single at least worse, the probability of error is higher, although through the BCH, LDPC and other error correction algorithm can be restored, but the need for time has become longer.
-
Because the memory card becomes more full, the write operations that occur later bring a huge write amplification, which seriously affects performance. Speaking of writing amplification, first introduce a noun called reserved space (over provisioning), that is, reserving a portion of the storage space does not save data. By default, with 1024 and 1000 errors, there will be at least a little more than 7% op presence, remember, no OP,SSD is not working, or even if you can barely work, the efficiency is poor. Why do you say that? The main reason is that NAND flash is a block-erase feature of the page write. Let's say we use 0 op, and then finally, all the page on the block is saved with useful data. When we want to update one of the page, because we can't overwrite the write directly, we need to free up the whole block, then erase it, and then write back the updated data. As you can see from the figure above, a block now has a size of 2MB (and a larger one). For an SSD with DRAM, it might also be possible to cache the block's data in DRAM, but for low-end SSDs and eMMC that don't have dram, there's not much RAM to cache the data, so it's a dead loop. So we have to reserve op for garbage collection.
Op can be changed, some vendors change, some are open to the user (such as my home's Samsung SSD), but guess eMMC to the user the probability of changing the OP is small.
How does it improve performance? The next step is to introduce write amplification (writing amplification). Write amplification is defined as the host to write a copy of the data, and because flash by block erase page-written features, n data needs to move to a new physical location. Finally SSD Master in order to achieve a host of a write and really operate n+1 write, that write amplification is n+1 times. Then why the hell is that? For example, let's take 7% op for example. After long-term use, we can assume that the useful data and dirty data are very evenly distributed in the flash. So in the case of full disk, a block assumes that if there are 256 page, that is almost 238 to have useful data, and 18 (7%*256) to save the garbage data that has not been. In order to write new data, we need to move the old data. At this point, no matter how full the disk, the firmware must ensure that there is a block is empty, and then just the block of 238 useful page to move, and then write a new page behind, of course, the previous block can be erased. In order to write these 18 page, the firmware actually wrote 256 page, write amplification more than 10 times times, the performance is very poor, flash life will also fall faster.
- Due to prolonged use, the contents of a sequential file have been placed in several different physical locations, causing the read time to become longer.
However, for these, kernel has been completely blocked, kernel read and write eMMC, will only tell it to read and write the logical address LBA,EMMC FTL (Flash translation layer) is responsible for converting this logical address to the physical address on the real NAND. Even the real address is not known kernel, is completely transparent. And now the NAND Flash has a variety of pits, emmc this form of advantage is to leave the details of concern to the EMMC firmware developers. The so-called surgery industry has specialized, to leave these complex things to professional people, in fact, is a good choice, or the vast number of mobile phone manufacturers also have to care about these details, the burden of big do not say, do not have better performance.
Finally give a very authoritative and very classic figure. Linux storage System to be optimized, from the kernel level I think it is feasible to optimize the file system from top to bottom, block layer, MMC driver. But the details I do not understand, exclamation an acoustic endless ah!
How is the Android storage system optimized?