The log file system can ensure the integrity of the overall data in the case of power failure or other system faults. Linux is one of the most supported operating systems, this article focuses on the common log file systems in Linux: ext3, reiserfs, XFS, and JFS log technologies, and tests them using standard test tools postmark and Bonnie ++, detailed performance analysis is provided, which has important reference value for Linux server applications.
I. Overview
Log File
The file system is based on the traditional file system and adds the log records of file system changes. Its design idea is to track and record changes in the file system and log the changes. The log file system stores log records in the disk partition. Write operations are performed on the record files first. If the entire write operation is interrupted due to some reason (such as power loss), when the system restarts, the write operation before interruption will be restored Based on the log records. In the log file system, all changes to the file system are recorded in the log. at a certain time, the file system writes the updated metadata and file content to the disk. Before any changes to the metadata, the file system driver writes an entry to the log describing what it will do and then modifying the metadata. Currently, the Linux Log File system mainly includes the ext3 developed on the basis of ext2, The reiserfs designed based on the object-oriented idea, XFS transplanted by the sgi irix system, JFS transplanted from the ibm aix system. ext3 is fully compatible with ext2, and its disk structure is the same as ext2, but it is added to the log technology; the other three file systems use Tree B to improve the efficiency of file systems.
Ii. ext3
The ext3 file system is developed directly from the ext2 file system. Currently, the ext3 file system is very stable and reliable. It is fully compatible with the ext2 file system, you can smoothly transition to a file system with sound log functions. The idea of the ext3 log file system is to make any advanced modifications to the file system in two steps. First, store a copy of the block to be written in the log. Second, when the I/O data sent to the log is transferred completely (that is, the data is submitted to the log ), block is written to the file system. When the I/O data transfer to the file system is terminated (that is, the data is submitted to the file system), the block copies in the log are discarded.
2.1 ext3 log Mode
Ext3 can log only the metadata and file data blocks. Specifically, ext3 provides the following three log modes:
Log (Journal)
All data and metadata changes in the file system are logged. This mode reduces the chance of losing the modifications made to each file, but it requires a lot of extra disk access. For example, when a new file is created, all its data blocks must be copied as a log record. This is the safest and slowest ext3 log mode.
Subscription (ordered)
Only changes to the file system metadata are recorded in the log. However, the ext3 file system groups metadata and related data blocks so that metadata can be written into data blocks before being written to the disk. In this way, you can reduce the chances of data corruption in the file. For example, make sure that any write access to the larger file is fully protected by logs. This is the default ext3 log mode.
Writeback)
Only changes to the metadata of the file system are recorded in logs. This is the method found in other log file systems and the fastest mode.
2.2 log block device (jbd)
The ext3 file system does not process logs, but uses the journaling block device or the general kernel layer called jbd. The ext3 file system calls the jdb routine to ensure that its subsequent operations will not damage the disk data structure in case of a system failure. The interaction between ext3 and jdb is essentially based on three basic units: log records, atomic operations, and transactions.
Log records are essentially descriptions of the low-level operations to be performed by the file system. In some log file systems, the log records only include the modified byte range and the starting position of the byte in the file system. However, the log records used by the jdb layer are composed of the entire buffer modified by low-level operations. This method may waste a lot of log space (for example, when a low-level operation only changes a bit of the bitmap), but it is still quite fast, because the jbd layer operates the buffer zone and the buffer header directly.
Any system call to modify a file system is usually divided into a series of low-level operations to manipulate the disk data structure. If these low-level operations are not completed, the system will crash and the disk data will be damaged. To prevent data corruption, the ext3 file system must ensure that each system call is processed in an atomic manner. Atomic operations are a set of low-level operations on the disk data structure. These low-level operations correspond to a separate high-level operation.
For efficiency reasons, the jbd layer uses the log processing grouping method to group the log records processed by several atomic operations in a single transaction. In addition, all log records related to a processing must be included in the same transaction. All log records of a transaction are stored in the continuous log block. The jbd layer processes each transaction as a whole. For example, the block used by the transaction is recycled only when all data in the log records of a transaction is submitted to the file system.
Iii. reiserfs
Reiserfs is a very good file system, and its developers are very energetic. The entire file system is completely designed from scratch. At present, reiserfs can easily manage file systems of hundreds of GB, which is very important in enterprise applications. The reiserfs is designed based on object-oriented thinking and consists of the semantic layer and storage layer. The Semantic layer mainly manages object namespaces and defines object interfaces to determine object functions. The storage layer manages disk space. The Semantic layer is associated with the storage layer through keys. The Semantic layer parses the object name to generate a key. The storage layer uses the key to locate the storage space of the object on the disk. The key value is globally unique.
3.1 main semantic layer Interfaces
1) Each file of the file interface has an interface ID, which identifies a method set. This method set contains all interfaces for accessing the reiserfs file.
2) The property interface reiserfs implements a new interface that treats each property of a file as a file. The property value is the content of this file, to implement directory-based access to file properties.
3) the hash Interface directory is the ing table from the file name to the file, and the reiserfs implements this ing table through the B + tree. Because the file name is variable and sometimes the file name will be long, the file name is not suitable as the key value, so the hash function is introduced to generate the key value.
4) The security interface handles all security checks, usually triggered by the file interface. The following uses the Read File as an example: the read method of the file interface will call the read chech method of the security interface to check the security before reading the file data, the latter will call the read method of the property file to read the file property for inspection.
5) The item interface is mainly used to balance items, including splitting items, evaluating items, overwriting items, and appending items, item deletion, insertion, and search.
6) The key assignment interface is triggered when a key is assigned to an item. Each item has a key allocation method.
3.2 storage layer
Reiserfs stores data in the B + tree, and Its Structure
Figure 1: reiserfs B + tree
Each node in the B + tree has a data structure called item. An item is a data container. An item only belongs to one node and is the basic unit of the node management space ., An item includes the following:
1) item_body: The data field of the item
2) item_key: key value of the item
3) item_offset: the offset of the start point of the data field in the node.
4) item_length: the length of the data field
5) item_plugin_id: item Interface ID.
Figure 2: reiserfs item Structure
Reiserfs has designed a variety of different items to store different data, mainly including the following:
1) static_stat_data: Static statistics, including the file owner, access permission, creation time, last modification time, and number of links.
2) cmpnd_dir_item: contains various directory items
3) extend_pointers: point to a disk area (extend)
4) node_pointers: points to a node.
5) bodies: contains a small part of the file data.
3.3 reiserfs logs
Like ext3, reiserfs also has three log modes: Journal, ordered, and writeback. At the same time, reiserfs introduces two log Optimization Methods: Copy-on-capture and steal-on-capture. Copy-on-Capture: When the block of a transaction to be modified is in another uncommitted transaction, copy this block so that the two transactions can be performed concurrently. Steal-on-Capture: When a block is modified by multiple transactions, only the transaction submitted at the latest actually writes the block to the file system, and no other transactions write the block.
Iv. XFS
XFS is a high-performance 64-bit file system developed by SGI to replace the original EFS file system. XFS maintains cache consistency, locates data, and distributes disk requests to provide low-latency and high-bandwidth access to file system data. Currently, SGI has transplanted the XFS file system from IRIX to Linux.
4.1 allocation groups)
When creating an XFS file system, the underlying Block devices are divided into eight or more linear regions of equal size (region). You can think of them as chunks) or "linear range". In XFS, each region is called an "allocation group ". The allocation group is unique because each allocation Group manages its own inode and free space. In fact, these allocation groups are converted into a file subsystem, these subsystems are transparently stored in the XFS file system. With an allocation group, XFS Code allows multiple threads and processes to run in parallel, even if many of them are performing large-scale Io operations on the same file system. Therefore, the combination of XFS with some high-end hardware will achieve high performance without making the file system a bottleneck. The allocation group uses an efficient B + tree internally to track the main data, with superior performance and great scalability.
4.2 log records
XFS is also a log recording file system that allows quick recovery after unexpected reboot. Like reiserfs, XFS uses logical logs. Unlike ext3, XFS records text file system blocks to logs, but uses an efficient disk format to record metadata changes. In terms of XFS, logical logging is very suitable. on high-end hardware, logs are often used for competition in the entire file system.