EXT3 file system and JDB introduction

Source: Internet
Author: User

EXT3 Introduction

For the ext3 file system, disk space divides a series of block groups, each of which has bitmaps to track the allocation and scope of inode and data blocks. Its physical layout is as follows:

superblock: Located in Group No. 0 block, in order to ensure compatibility, the first 1024B byte is 0,sb from 1024B offset storage, size 1024B. It stores information about the file system and has backups (0,1,3,5,7,9,25,37,49,81, etc.) in multiple group. Most of the information is determined at the time of formatting and is read-only. Can be viewed with the DUMPE2FS command;

Group Descriptor: located in Group 1th block, describes the group information, such as Inode Bitmap,data Blockbitmap and other location information. The group descriptor has a backup in multiple group. In order to ensure the file re-write, as far as possible in a continuous space, the Ext3 file system will 32768 (0x8000) block composed of a group,group through the group Descriptor description, all group Descriptor there is a group Descriptor table, the total DESC table size can not exceed 1 block size, for 4096B block, then up to 1024b*4/32b = 128 Group;

block Bitmap: occupies 1 blocks, describes the use of block, used for the allocation of block within the group;

inode bitmap: Occupies 1 blocks, describes the use of the inode, for the allocation of inode within the group;

inode table: Occupies multiple blocks, storing inode information. Each inode 128Byte, the inode is the data describing the data, that is, the file system metadata, is the most important part, including the permissions of the file, the owner, the time information, which block the data stored in the upper-class information; The Inode manages the block through a multilevel index table ; The inode is allocated and is updated to the inode bitmap;

Data block: Uses multiple blocks to hold the information. When the block is allocated, it is updated to the Inode Index table and block bitmap;

JBD Introduction

The Ext3 file system, as a log file system, does not process logs by itself, but instead uses log block devices (Journaling block device) or a generic kernel layer called JBD. JDB has 3 core concepts: Logging, atomic operations, and transactions.

log Records (journal ): Essentially a description of the low-level operations that the file system will emit. In some log file systems, the log records include only the byte ranges modified by the operation and the starting position of the bytes in the file system. However, the log records used by the JDB layer consist of the entire buffer modified by the low-level operation. This approach can waste a lot of log space (for example, when a low-level operation changes only one bit of a bitmap), but it is quite fast, because the JBD layer directly operates on the buffer and buffer headers.

Atomic operations (handle ): Any system call that modifies a file system is typically divided into a series of low-level operations that manipulate disk data structures. If these low-level operations do not complete the system unexpectedly, the disk data is corrupted. In order to prevent data corruption, the EXT3 file system must ensure that each system call is processed in an atomic manner. A set of modifications or writes that need to be done atomically, called atomic operations.

Transactions (Transaction : It may be less efficient to write each atomic operation to the log. For higher performance, JBD packages a set of atomic operations into one transaction and writes the transaction to the log at once. All log records for a transaction are stored in contiguous chunks of the log. The operating unit of the JDB is a transaction.

When a transaction is being committed, its life cycle undergoes the following series of states:

    1. Run (running): The transaction is currently in memory and can accept new atomic operations. In a system, only one transaction can be in the running state.
    2. Lock (Locked): The transaction no longer accepts new atomic operations, but existing atomic operations are not yet complete. Once all the atomic operations are complete, the transaction goes to the next state.
    3. Write (flush): All atomic operations in the transaction are complete and the transaction is writing to the log.
    4. Commit: The transaction has been written to the log. The transaction writes a commit block that indicates that the transaction log has been written to the log.
    5. Finish (finished): After the transaction is written to the log, it remains there until all the blocks are updated to the actual location on the disk.
EXT3 Log Mode

EXT3 can log both the metadata and the file data block at the same time.

Log writes are divided into 3 stages:

    • Journal write: Transaction writes to log space;
    • Journal Commit: Writes a commit block; a transaction with full commit to the log area begins with Jfs_descriptor_block and ends with Jfs_commit_block;
    • CheckPoint write: The transaction is written to disk space, and its space in the log is reclaimed. Checkpointing trigger scenes are many, such as file system cache reached threshold, log remaining space to reach the threshold, timer timeout and so on.

At the same time, EXT3 provides three logging modes:

    • Writeback

Only the changes to the file system metadata are logged and the fastest mode. Data blocks are written directly to the real location on the disk (fixed locations), and this mode does not guarantee the order in which logs and data are written. Write-back mode is the worst consistency in three modes, it only guarantees the consistency of the file system metadata and does not guarantee the consistency of the data.

    • Ordered

Only file system metadata is written to the log. However, the data is guaranteed to be written to the true storage location before the metadata is written to the log. This mode provides higher consistency protection than the writeback mode: both data and metadata are guaranteed to be consistent.

    • Journal

All changes to the file system's data and metadata are recorded in the log. This means that all data blocks are written 2 times, written to the log at once, and then written to the real location on disk (fixed locations). Like ordered, data mode provides consistent protection of the same intensity.

Log Mode comparison analysis:

    • Compared with the no log file system, the log mode has high performance under the random write scene.
    • Writeback and ordered have high performance in large file sequential write scenes;
    • Data will be converted into order to obtain the high performance of order, so the data has high performance under the asynchronous small file random write scene;
    • Data and ordered provide the same consistency protection;
    • In some scenarios, data performance is high, some scenarios under ordered performance is high;
    • Ordered mode, fixed data write, journal Inode write,journal commit write sequentially. When the log is stored on a standalone device, this restriction will unnecessarily degrade performance;
    • A lot of temporary files in the scenario, data and ordered performance is low, because the timer refreshes the metadata to the log, the corresponding data must also be written, unnecessary temporary file writes, increase the IO load.

EXT3 Log View

A log is an internal record (log) that manages an update of a block device. Updates are first placed in the log and then written to their true location on disk. The EXT3 log (journal) can be seen as a file whose inode is fixed at 8, in the first group, with the physical layout of the Super block, description block, commit block, and so on.

The detailed internal view of the EXT3 log is as follows, first the log superblock, then each transaction description block, and finally the data block. The complete transaction is divided into 3 parts: The transaction start block, the data index entry, and the transaction commit block.

Reference documents:

Linux Kernel 2.6.32

Analysis and Evolution of journaling File Systems

Ext3 journaling FileSystem (Stephen C. Tweedie)

Journal block device source code analysis

--eof--

EXT3 file system and JDB introduction

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.