Soft Updates: A technique for eliminating most synchronous writes (i)

Source: Internet
Author: User
Tags data structures file system log resource

Author and consultant: Marshall Kirk McKusick, Carnegie Mellon University Gregory R ganger Chinese translation: Computer College of Beijing University of Technology Li Xin <delphij@cnfug.org>

--------------------------------------------------------------------------

This paper was originally published at the USENIX annual technical Meeting held from June 6, 1999 to 11th, and the 第1-17 page of the Freenix meeting. Its copyright belongs to Marshall Kirk McKusick and Greg Ganger all, the author retains all power. The paper was translated and republished under the permission of the author. It is permissible to redistribute the article for non-commercial purposes, subject to the integrity of this copyright declaration.

--------------------------------------------------------------------------

Summary

Traditionally, there are two ways to maintain file system consistency maintenance after power down or system crashes: one is to write synchronous writes to the dependent-order metadata (metadata), and the other is to use a write-in-form log to organize the atomic operations together. Soft Updates, a different approach, is an implementation mechanism that ensures that metadata is updated in a dependent order to ensure the overall consistency of the file system on disk. Using softupdates avoids the need for stand-alone logs or large amounts of synchronous write operations. It also consolidates many previously independent and synchronized operations to reduce 40%-70% write operations in file-intensive environments such as program development, mail servers, and so on. While improving performance, softupdates can also better maintain file system consistency. By guaranteeing that an inconsistency exists only in undeclared blocks or I-nodes, Softupdate can eliminate the dependency on running the file System checker after a system crash. This allows the file system to be available immediately after the reboot. In addition, you can reclaim missing blocks and I-nodes from a running file system through a background task.

This paper describes a softupdates implementation integrated into the 4.4BSD fast file system (fast filesystem). It details the modifications made to the research prototype and the BSD system when building a finished product-level quality system. At the same time, it also discusses the experience, difficulties and lessons learned from the transfer of softupdates from research to reality; Those very regulated file system operations (for example, fsck and ' fsync ') need to be reconsidered and added to the code. The experience of the resulting system proves the previous research: Softupdates is well integrated into the existing file system, ensuring metadata correlation, and basically achieves the best performance.

Section 1th Research Background and introduction

Metadata (such as file directories, I-nodes, and free block mapping tables) indicates the structure of the raw data store (raw storage). Metadata provides pointers and descriptors, which are used to document the slices of the disk and differentiate them. To maintain reliable storage for a long time, file systems must ensure metadata integrity when encountering unexpected system crashes, such as power outages and operating system failures. Because a similar crash usually results in the loss of all information stored in volatile main memory, the information stored in nonvolatile memory (for example, disk) must always be consistent enough to properly reconstruct the file system. In particular, the file system's mirroring on disk must not contain dangling pointers, nor can it have a resource ownership pointer sufficient to cause two semantics, or an activity resource that is not referenced. Maintaining these principles usually requires updating small metadata objects sequentially (or grouped by atomic operations).

In the past, the BSD Fast file system (FFS) and its derived systems used synchronous writes to ensure a stable memory write order. For example, to create a file in a BSD system, you first need to allocate and initialize the new I-node and populate a new directory-pointing item. Because of synchronous writes, the file system forces the application that creates the files to wait for the initialization to complete, and the result is that operations like creating and deleting files in these systems will be done on disk rather than cpu/memory. Because disk operations are slower than other parts, synchronous writes can degrade system performance. Metadata update issues can also be addressed through other mechanisms, such as an uninterruptible power supply (UPS) or flash, using the NVRAM technology. At this point you only need to ensure the consistency of NVRAM, and updates can be copied to disk in any convenient way. Another is to group operations into atomic operations that contain some kind of write-log or to use shadowpaging. In summary, these methods achieve the goal by adding additional information to the disk that can be used to reconstruct the submitted metadata after a system failure or media corruption. Many modern file systems have successfully used the write-back log to achieve better performance than synchronous writes. Another approach, Softupdates, was suggested in [Ganger & Patt, 1994] and evaluated in the research model. Using softupdates, File system deferred writes (such as write-back caching) metadata modifications, tracking updated dependencies, and maintaining a one by one dependency between them when writing back. Because many metadata blocks contain a large number of pointers, cyclic dependencies occur frequently when the dependencies are only block-level records, so softupdates traces the dependencies in the pointer, which allows the blocks to be written in any order. Non-independent updates are rolled back (Rolled-back) before other writes, and are recovered after the write is completed, and the circular dependency problem is eliminated. When using Softupdates, the application always sees the most recent copy of the metadata block, and the data on the disk is always consistent with other content on it. In this paper, we describe the process of integrating FFS in 4.4BSD Softupdates used in NetBSD, OpenBSD, FreeBSD, BSDI operating systems. At the same time, we discussed the experience and lessons, and described some of the more complex problems in the file system, using core memory tracing dependencies, complete "fsync" invocation implementation, some system call semantics, and so on. Correctly detects and processes lost resources in fsck, cleanly and correctly completes an additional consideration required by a unmount system call, and increases the complexity of the code accordingly. Despite these difficulties, our performance tests confirm the conclusions of earlier studies. In particular, the use of SOFTU in the BSD FFSPdates eliminates the vast majority of synchronous writes and is less than 5% different from the theoretical best case (fully asynchronous update FFS). At the same time, Softupdate makes BSD FFS semantics clearer, more robust, and provides better security assurances. In addition, it can recover immediately after a crash (fsck is no longer required). The remainder of this paper includes: section 2nd, describing update dependencies in the BSD FFS operation, and section 3rd describes how the BSD Softupdates implementation handles them, including key data structures, and how to use them. and integration into the 4.4BSD operating system of the process; in the 4th section, we discuss the experience and lessons learned from translating prototypes into production environments; Section 5th briefly summarizes the performance improvement after the introduction of Softupdates in the 4.4BSD system; Section 6th discusses the new file system snapshot support and how this feature is used in an active text Part of the system is used for background execution of the local fsck, section 7th provides an overview of the status and availability of BSD Softupdates code.

2nd section Update dependencies in the BSD fast file system

Many important file system operations consist of a series of related updates to the distributed metadata. To ensure recovery after unexpected failures, these modifications typically must be replicated to reliable storage in a specific order. For example, when a new file is created, the file system first assigns a I-node, initializes it, and creates a directory entry to point to it. If the system crashes when a new directory entry has been written and the corresponding I-node has not yet been written, the integrity is corrupted because the I-node state on the disk is unknown. To ensure the consistency of the metadata, the initialized I-node must reach the reliable memory before the new directory entry. We call this requirement an update dependency--the safe writing of directory entries depends on writing the I-node first. The update order can be described in three simple rules:

1. Never point to it before a struct is initialized (for example, the I-node must initialize before a directory entry references it) 2. Never reuse this resource until all pointers to a resource are cleared 0 (for example, the I-node pointer to a data block must clear 0) 3 before that block of data is assigned to the other I-node. Do not reset the old pointer until the new pointer to a live resource is set successfully (for example, when renaming a file, you should not remove the I-node's old name before writing a new name).

This section describes the issue of update dependencies in BSD FFS, limited to space, and we assume that the reader has a rudimentary understanding of the BSD FFS described by [McKusick Etal, 1996].

A total of 8 BSD FFS operations need to be updated sequentially to ensure a crash recovery: Create files, delete files, create directories, remove directories, file/directory renaming, block allocations, indirect block maintenance, and idle map management.

I-node and data block are two main resources of BSD FFS management. To manage these resources, two bit mapping tables are used to manage these resources. For each I-node of the file system, there is a corresponding bit in the I-node map table, where 1 indicates that this I node is in use, and 0 indicates that this I-node is idle. Similarly, for each block of data, there is also a corresponding bit in the block-bit mapping table that indicates whether it is idle or in use. The FFS file system can be segmented into a fixed size unit with a cylinder group (cylinder groups). Each cylinder group has a block that contains a bit map of the I-node and the data block that are included in the current cylinder group. For a large file system, this organization makes it possible for core memory (kernel memory) to hold only such small file system units. The active cylinder group is stored in a separate I/O buffer and can be written independently of other cylinder groups.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.