JFS overview how to shorten system restart time for a log file system

Source: Internet
Author: User
JFS provides an overview of how to shorten the restart time of a log file system-Linux Release Technology-Debian information. For more information, see the following. In the event of a system crash, JFS provides fast file system restart. By using the database log technology, JFS can restore the file system to a consistent state within several seconds or minutes, rather than a log file system that takes several hours or even days to complete. This White Paper provides an overview of the JFS architecture and describes the design features, potential limitations, and management utilities of the JFS technology that can be found on the developerWorks website.

The Log File System (JFS) provides a log-based byte-level file system developed for a high-performance transaction-oriented system. It is scalable and robust. Compared with non-log file systems, it has the advantage of its fast restart capability: JFS can restore the file system to a consistent state in seconds or minutes.

Although JFS is designed to meet the high throughput and reliability requirements of servers (from single processor systems to advanced multi-processor and Cluster Systems, JFS can also be used for client configurations that require high performance and reliability.

Architecture and design

The JFS architecture can be described from the perspective of disk layout characteristics.

Logical volume
All file systems discuss some types of logical volumes. This can be a physical disk or a subset of the physical disk space, for example, a FDISK partition. A logical volume is also called a disk partition.

Clustering and file set
The file system creates the utility mkfs and creates a clustering that is completely contained in the partition. Clustering is a disk block array that contains a specific format, including the super block and allocation ing table. The super block identifies the partition as a JFS aggregation, and the allocation ing table describes the distribution status of each data block in the aggregation. The format also includes the initial file set and control structure required to describe it. A file set is an installable entity.

File, directory, inode, and addressing Structure
The file set contains files and directories. Files And Directories are continuously represented by inode. Each inode describes the attributes of a file or directory and serves as the starting point for searching files or directory data on a disk. JFS also uses inode to represent other file system objects, such as the inoing table describing the distribution status and disk location of each inode in the file set.

Directory maps user-specific names to inode allocated to files and directories, and forms a traditional naming hierarchy. The file contains user data without any restrictions or formats. That is to say, JFS regards user data as an uninterpreted byte stream. Inode-based disk-based addressing structure is used to map file data to disks. The aggregation super block and disk allocation ing table, file descriptor and inode ing table, inode, directory, and addressing structure together represent the JFS control structure or metadata.

JFS logs are maintained in each aggregation and used to record metadata operation information. A log format is also set by the file system creation utility. Multiple installed file sets in the cluster can use one log at the same time.

Design Features

JFS was designed to fully integrate logging from the very beginning, rather than adding logging to an existing file system. Many features of JFS make it different from other file systems.

Log Processing
JFS provides improved structural consistency and recoverability, and faster system restart times than non-log file systems (such as HPFS, ext2, and traditional UNIX file systems. When a system failure occurs, the non-log file system is prone to crash because a logical file write operation usually takes up multiple media I/O and is completed at any given time, it may not be fully reflected in the media. These file systems rely on restart utilities (that is, fsck), and fsck checks all metadata of the file system (such as Directory and disk addressing structure) to detect and fix structural integrity issues. This is a time-consuming and error-prone process. In the worst case, it may also lose or put the wrong data.

On the contrary, JFS uses the technology originally developed for databases to record the operations (that is, atomic transactions) performed on the file system metadata. If a system failure occurs, you can restore the file system to a consistent state by replaying the logs and recording the appropriate transaction application logs. Since the replay utility only needs to check the running records generated by the most recent activities of the file system, rather than the metadata of all file systems, the file system recovery time related to this log-based method is much faster.

Other aspects of log recovery are worth noting. First, JFS only records operations on metadata. Therefore, replaying these logs can only restore the consistency between the structure and resource allocation status in the file system. It does not record file data or restore the data to a consistent state. Therefore, after recovery, some file data may be lost or invalid. users who have critical requirements for data consistency should use synchronous I/O.

In the face of media errors, log records are not particularly effective. In particular, an I/O error occurs when logs or metadata are written to the disk, meaning that the file system must be restored to a consistent state after the system crashes, comprehensive integrity checks that are time-consuming and possibly imposed. This implies that bad block relocation is a key feature of any storage manager or device residing under JFS.

The syntax of the JFS log record is as follows: when a file system operation involving metadata changes, for example, unlink (), returns a successful execution return code, the operation result has been submitted to the file system, even if the system crashes, you can find it. For example, once a file is successfully deleted, even if the system crashes and then restarted, it is still deleted and will not appear again.

The logging style introduces inode or vfs operations for each metadata modification to the log disk synchronously. (For database experts, this is a redo-only, physical residual image, and ahead-of-write logging protocol that uses a non-deprivation buffer policy .) In terms of performance, this method is better than many non-log file systems that require careful synchronization of metadata write operations to achieve consistency. However, it is inferior to other log file systems in terms of performance. Other log file systems, such as Veritas VxFS and Transarc Episode, use different log styles and slowly write log data to disks. In a server environment that executes multiple parallel operations, you can combine multiple synchronous write operations into a single write operation group for submission to reduce this performance loss. The JFS logging style has been improved over time. Now asynchronous logging is provided, which improves the file system performance.

Disk-based addressing Structure
JFS uses a disk-based addressing structure, along with an active block allocation policy, to generate a compact, efficient, and scalable structure that maps logical offsets in files to physical addresses on disks. A disk is a sequence of connected blocks allocated to a file as a unit. <逻辑偏移量,长度,物理地址> The productkey, devicename, and devicesecret. The addressing structure is a B + tree, which is filled by the disk Descriptor (the triples mentioned above) and is rooted in inode. The key is the logical offset of the file.

Variable Block Size
By file system, JFS supports block sizes of 512, 1024, 2048, and 4096 bytes to allow users to optimize space utilization based on the application environment. The smaller block size reduces the number of internal storage fragments in files and directories, and the space utilization is higher. However, small blocks may increase the path length. Compared with large block sizes, small block allocation activities may occur more frequently. The default block size is 4096 bytes because the server system generally focuses on performance rather than space utilization.

Dynamic Disk inode allocation
JFS dynamically allocates space for the disk inode as needed, and releases the space that is no longer needed. This feature avoids the traditional method of retaining a fixed amount of space for disk inode during file system creation. Therefore, you no longer need to estimate the maximum number of files and directories contained in the file system. In addition, this feature isolates the disk inode from the Fixed Disk location.

Directory Organization
JFS provides two different directory organizations. The first type of organization is used for small directories and the directory content is stored in the inode of the directory. This eliminates the need for different directory blocks I/O and the need to allocate different storage. Up to 8 items can be directly stored in inode, and these items do not include themselves (.) and parent (..) directory items, which are stored in different regions of inode.

The second type of organization is used for large directories. Each directory is represented by B + tree key by name. Compared with traditional unordered directory organizations, it provides faster Directory Search, insertion, and deletion capabilities.

Sparse and intensive files
By file system, JFS supports both sparse and intensive files.

Sparse files allow writing data to any location of a file, rather than instantiating unwritten intermediate file blocks. The reported file size is the highest block location that has been written. However, the actual allocation of any given block in the file only happens when the block is written. For example, create a new file in a file system that is specified as a sparse file. The application writes 100th data blocks to the file. Although only one disk space is allocated to it, JFS will report that the size of this file is 100. If the application reads 50th pieces of the file in the next step, JFS returns a byte block filled with 0. Assuming that the application then writes a piece of data to the file's 50th blocks, JFS still reports that the file size is 100 blocks, and now it has allocated two disk spaces. Sparse files are suitable for applications that require a large Logical space but only a small subset of the space.

Disk resources equivalent to the file size will be allocated to intensive files. In the preceding example, the first write operation (writing a piece of data to the first 100th pieces of files) will allocate the disk space of the first 100 pieces to the file. For read operations on any block that has been implicitly written, JFS returns a byte block filled with 0, just as in the case of sparse files.

JFS internal (potential) Restrictions

JFS is a fully 64-bit file system. All structured fields in the JFS file system are 64-bit. This allows JFS to support both large files and large partitions.

File System size
The Minimum File System Supported by JFS is 16 Mb. The maximum file system size is the product of the maximum block size supported by the file system block size and the metadata structure of the file system. JFS will support up to 512 trillion bytes (TB) in length (512 bytes in block size) to trillion bytes (PB) (4 kb in block size)

File Length
The maximum file length is the maximum file length of the Virtual File System Supported by the host. For example, if the host only supports 32 bits, the file length is limited.

Removable Media
JFS does not support using a floppy disk as a basic file system device.

Standard Management Utility

JFS provides standard management utilities for creating and maintaining file systems.

Create a File System
This utility provides a JFS-specific part of the mkfs command to initialize the JFS File System on the specified drive. The utility operates at a lower level and assumes that the creation/initialization of any volume existing in the file system is processed by another utility at a higher level.

Check/repair file systems
This utility provides a JFS-specific part of the fsck command. This command checks the consistency of the file system and fixes the detected problems. It also replays logs and applies the submitted changes to the file system metadata. If the file system is declared clean due to log replay, no further operations will be taken. If the file system does not think it is clean, this means that for some reason the log is not completely and correctly replayed, or the file system cannot simply replay the log to restore to a consistent state, perform a complete check on the file system.

When performing a full integrity check, the purpose of the check/repair utility is to achieve a reliable file system status to prevent future file system crashes or faults, the second goal is to save data in the face of a crash. This means that the utility may discard data to achieve file system consistency. Specifically, when the utility fails to obtain the required information without making assumptions to restore the inconsistent files or directories in the structure to a consistent state, the data will be discarded. When an inconsistent file or directory is encountered, the entire file or directory is discarded and no longer tries to save any part. Any files or subdirectories isolated from the deleted damaged directories are stored in the lost + found directory under the root of the file system.

One of the key considerations of the file system check/repair utility is the number of virtual storages required. Generally, the amount of virtual memory required by these utilities is determined by the size of the file system, because the required virtual memory is mainly used to track the distribution status of individual blocks in the file system. As the file system increases, the number of blocks increases, and the amount of virtual memory required for tracking these blocks also increases.

The Design difference between the JFS check/repair utility is that its virtual storage needs are determined by the number of files and directories in the file system (rather than the number of blocks. For the JFS check/repair utility, the virtual storage of each file or directory is about 32 bytes for each file or directory, or for a file system that contains millions of files and directories, regardless of the file system size, the demand for virtual storage is about 32 MB. Like all other file systems, the JFS utility needs to track the block allocation status, but instead of using the virtual storage method, it is implemented using a small part of the actual file system's reserved work zone.


JFS provides a fast file system restart time when the system crashes, so it is a key technology for Internet file servers. With the database log processing technology, JFS can restore the file system to a consistent state within several seconds or minutes. In a non-log file system, file recovery may take hours or days. Most File Server users cannot tolerate downtime related to non-log file systems. Only by transferring to log technology can these file systems avoid the time-consuming process of checking all metadata of the file system to verify the file system or restore it to a consistent state.


* For more information, see the original article on the developerWorks global site.

* JFS open source, on the developerWorks website

* IBM makes JFS technology available for Linux, dW special report

About the author

Steve Best works at IBM Software Solutions & Strategy Division in Austin, Texas and is a member of the file system development department. Steve used to develop file systems on operating systems, internationalization, and security. Steve is currently working on porting JFS to Linux. Contact him through a sbest@us.ibm.com.
Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.