log-structured File Systems

Last Update:2014-12-27 Source: Internet

Author: User

Tags imap

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Change to Blog Park layout problem, the original is here: http://xubenbenhit.github.io/LogStructureFileSystem.html

log-structured File systems2014-12-26 #system

First of all, this blog is about LFS, previously written hard disk and disk redundancy array is actually the first article, and this should be considered the third, the middle of a file system to introduce the blog, originally intended to go back to write together today, but considering the efficiency of their home after the not guarantee. So I finished this article in the lab, and the file system is all at home.

1. What is LFS?

First of all, how do we write the log file, is not the latest state of a line of writing at the end of the log file (this connection is my own yy). The same is true of the log-structured File System , where data is updated directly at the end. Such as:
　　

In the beginning, we write to the disk data Block Foo, and then the next side is its inode:

Then we write to block bar, next to a ' is the inode:

Finally, we update the data block foo to get Foo ', and then we write it back to disk and that's what it looks like:

2. Where did LFS come from?

LFS was made before and after the 90 's, and the main consideration at that time was the following (hearsay):

?

The increasing memory and the ability to expand the capacity of the cache will mean that the data read operation can become more and more unnecessary to access the disk, which makes the disk write operation more and more of the performance bottleneck.

Sequential disk data read speed following machine read disk data speed of the gap is increasing, in this case, if the random disk operations can be converted to sequential disk operations, the benefits are considerable;

When dealing with small files, the previous file system is inefficient, and in fact small files are common, and for small files, the disk itself is not good, because the RAID4/RAID5 processing small files when the performance is not high;

?

Because of this, there is a LFS. 　　LFS First caches the update of the disk data in the cache, and then writes the free disk once to a certain size (called a segment), thus obtaining high performance. Of course, it is very simple to say, do not do it, there are mainly two aspects of the problem: one is the entire operation of the file location, and the other is disk space management.
Here I think it is not in accordance with the idea of the paper to write, because it is boring, it is better to follow their own understanding of the step-by-step to do it, compared to the author of the thesis is also a step-by-step thought out.

3. Start the deduction

LFS is first in memory in the storage file update, this storage unit is called a segment, wait until the segment is full, and then write to the free disk once. The first step is to ensure that the entire operation is correct and efficient.

First, based on the design experience of the FFS (Fast file System), it is necessary to use an inode to store the meta-information for each file in the segment, which is the physical address of the file, such as size, file block (the file may require multiple disk blocks to store), permissions, and so on. Of course, the inode is also in the segment, in order to facilitate consideration is stored in the corresponding file behind, see the first part of the figure. Note that this inode is generally small, only about 128Byte.
According to the FFS design, the inode is stored in a contiguous disk space (physical address can be calculated beforehand), but a one-time write segment to disk This operation itself will make the file Inode "scattered in" the entire disk space, and according to LFS design, Files can not be modified in situ, which means that the latest version of the file's inode address is always changing!! Therefore, according to the "Convention", a layer of indirect connection (indirection), called IMAP, is designed to store the physical address of each inode.
Now that you have IMAP, where's the problem, where is IMAP stored? The first option is to store it in a fixed location on the disk, but we notice that IMAP is updated every time we write segment, and every time you update IMAP you need to move the head to a fixed position, a bit of efficiency! It's better to put IMAP behind the inode.

To say more, IMAP itself may also occupy multiple disk blocks, so only the updated piece of IMAP can be placed behind the inode. So there is a problem, IMAP address is not sure, how to addressing it? So you still need an address to determine the area, which stores the IMAP address is OK, this area is called Checkregion, which stores the IMAP each block physical address. You can navigate to the file by Checkregion->imap->inode->file Way.
But how do folders handle it?
According to the FFS design, the so-called folder is nothing more than a map of file names and files Inode, in the FFS folder is used as normal file data to store, then LFS also do so. So we can see that/dir/foo is stored in this way:

In particular, first locate the inode in the Dir folder in IMAP, and then find the inode Physical address of the Dir folder, then locate the physical address of the Dir folder, then get the inode of the Foo file, and then in the cr->imap- >inode->foo. Note that this design also has an unexpected benefit, that is, the file update operation under the folder is not required to update the folder itself, which also avoids the "Recursive Update" folder folder of folders ...
OK, so far, file location and no problem. But every time I write a segment to an idle disk, it goes on like this, which has so many disks? This involves the issue of disk space reclamation (segment cleaning).
In general, disk space reclamation generally requires these three steps: first read some segment into memory, and then determine whether the block data inside segment is "Alive", that is, whether the file is in the latest version, and finally all the "surviving" files are written in segment. And the rest of the space is recycled.
How can I tell if Block is the latest version?
To achieve this, you need to know the original information about the block in segment, such as the file Inode it belongs to and the location in the file. This information needs to be stored in an area called Segment summary block in segment. Then we can look up the current physical address of the block data and know if it is the latest version. Of course, this makes it a good idea to load the entire IMAP into memory for easy querying.
Of course, you can improve this setting, to set the version number for each file, each time the file update version number is incremented, the version number is stored in the segment summary block and IMAP, so that you can only compare the file version number to determine whether the entire file is the latest version.
About segment clean, when to execute it? Do clean achieve what goal to stop it? How do I choose Segment to perform clean? And do you need to reorganize live file to get a better locality? The first second experiment shows that it is less important, in practice, to set two thresholds, to perform clean when the number of idle segment is below a certain threshold, and to stop clean when the recycle causes the number of idle segment to be higher than the other threshold.
The third problem is very fastidious, in fact, the algorithm used in the paper is to calculate the clean price per segment (Cost-benefit), the calculation Formula specific reference Paper bar. To do this, save a segment usuage form in checkregine .
The fourth question, also a very skillful place, is to write segment in the order of the last time the surviving files were changed.
At this point, the description is almost there, look at the following two details.
The data IMAP address and segment usage, which are included in the Check point check , are all changing, so you need to update them every once in a while. Normally, the segment in the cache is written to disk, and then the Checkregion is updated, because an exception may occur during the update of the CR , so the real CR is wrapped in two timestamp blocks data, the CR is considered normal only when the two timestamp are consistent. To avoid a failure during the update of the CR, the CR is not available, and the LFS uses two CR, alternating, so that even if it fails, it can be recovered with another CR.
Crash Recovery Encounter Special Fault,LFS How to recover it? First, to find the latest available CR, the "usable" here indicates that the CR is consistent before and after timestamp. Then use the roll-followed algorithm mentioned in the paper to recover. There is a "strange" place, that is, the latest version of the IMAP block is already in the segment inside, then directly with this update can be!! Not really, because the IMAP itself may not have been written correctly considering the disk write operation ....
At this point, it should be almost, detailed introduction of the east system directly to see the information below it.

4. References

?

Http://pages.cs.wisc.edu/~remzi/OSTEP/file-lfs.pdf

Http://www.eecs.berkeley.edu/Pubs/TechRpts/1992/CSD-92-696.pdf

http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.54.502

Http://blog.notdot.net/2009/12/Damn-Cool-Algorithms-Log-structured-storage

?

log-structured File Systems

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More