Hard disk layout for EXT2 file system

Source: Internet
Author: User
Tags reserved uuid

June 01, 2002 This article mainly describes the more popular Linux ext2 file system on the hard disk partition detailed layout. The next version of the Ext2 file system with log support is the Ext3 file system, which is the same as the Ext2 file system on the hard disk layout, and the only difference is that the ext3 file system has a special inode on the hard disk (which can be understood as a special file) to record the log of the file system. Also known as the journal. Since this article does not discuss log files, the contents of this article are applicable to both ext2 and ext3.

Objective

The source code for this article is the Ext3 file system in the Linux kernel. In order to facilitate reader access to the source code, some of the key technical terms in this article are used in the kernel source code used in English words, without the use of the corresponding Chinese translation. (This method is appropriate, but also ask readers to advise.) )




Back to the top of the page


A rough description

For ext2 file systems, hard disk partitions are first divided into blocks, and each block on a ext2 file system is the same size, but for different ext2 file systems, the block size can vary. The typical block size is 1024 bytes or 4096 bytes. This size is determined when the Ext2 file system is created, it can be specified by the system administrator, or the file system creator can automatically select a reasonable value based on the size of the hard disk partition. These blocks are grouped together into several large block groups. How many blocks in each block group are fixed.

Each block group corresponds to a group descriptor, and these group descriptor are clustered together at the beginning of the hard disk partition, following the Super block. The so-called super block, we have to discuss below. There are several important block pointers in this descriptor. The block pointer we are talking about here refers to the block number on the hard disk partition, for example, the value of the pointer is 0, we say it points to block 0 on the hard disk partition, the pointer value is 1023, and we say it points to block 1023 on the hard disk partition. We notice that the block count on a hard disk partition starts at 0, and that this count is global for this hard disk partition.

In Group descriptor of Block Group, where a block pointer points to each bit in the block Bitmap,block bitmap of this block group represents a block, if the bit is 0, indicating that there is data in the block, and that if the bit is 1, it is idle. Note that the block bitmap itself is just as large as a block. Assuming the block size is S bytes, the block bitmap can only record 8*s blocks (because one byte equals 8 bits and one bit corresponds to a block). That is to say, a block group can only have 8*s*s bytes so large.

In the group descriptor of Block Group, another block pointer points to the Inode bitmap, and the bitmap is just as big as a block, with each bit corresponding to an inode. An inode on a hard disk is roughly relative to a file or directory on the file system. As for the inode, we'll talk more about it below.

Another important block pointer in the descriptor of block group refers to the so-called Inode table. This inode table is larger than one block. This inode table is formed with all the inode that is gathered in this block group.

The most critical information recorded in an inode is where the user data in the inode is stored. As we mentioned earlier, an inode is roughly relative to a file in the file system, so the content of the user file is stored somewhere, which is an inode to answer. An inode answers this question by providing a series of block pointers. These block pointers point to the block, which holds the contents of the user's file.

2.1 Review

Now let's look back. The hard disk partition is first divided into many blocks. These blocks are grouped together and divided into groups, which are block group. Each block group has a group descriptor. All these descriptor are gathered together at the beginning of the hard disk partition, followed by the Super block. From Group descriptor we can find the Inode table and block bitmap of this block group and so on through the block pointer. From the Inode table, we can see the inode. From an inode, we can then find the blocks that hold the user's data through the block pointer inside it. We would also like to mention that the block pointer is not allowed to wander around. The block bitmap and Inode bitmap of a block group, and the Inode table, are stored sequentially at the beginning of the group, and the blocks that hold the user data are immediately behind them. After one block group ends, another block group starts again.




Back to the top of the page


Detailed layout of the situation

3.1 Super block

The super block of the so-called ext2 file system is a portion of the data that begins at the beginning of the hard disk partition (byte 0, the first byte) from byte 1024. Because the block size is the smallest 1024 bytes, the Super block may be in Block 1 (at which point the block size is exactly 1024 bytes), or it may be in block 0.

The details of the Super block on the ext3 file system on the hard disk partition are as follows. Where __u32 is the data type representing the unsigned bits without symbols, and the rest of the analogy. This is the type of data used in the Linux kernel, and if it is a program that develops user space (user-space), it can be replaced with unsigned long and so on, depending on the specific computer platform. The section on fragments in the following list can be ignored, and the Ext3 file system on Linux does not implement the fragments feature. Also note that the data on the hard disk partition of the ext3 file system is stored according to Intel's Little-endian format, especially if you are developing EXT3-related programs on a platform other than your PC. If you're just doing development on your PC, you don't have to pay special attention.

struct Ext3_super_block {/*00*/__u32 s_inodes_count;      /* inodes Count * * __u32 s_blocks_count;    /* Blocks Count * * __u32 s_r_blocks_count; /* Reserved Blocks count */__u32 S_free_blocks_count; /* Idle blocks count * */*10*/__u32 s_free_inodes_count;  /* Idle inodes count * * * __u32 s_first_data_block;    /* First Data block */__U32 s_log_block_size;     /* Block Size * * __S32 s_log_frag_size;  /* Can ignore * */*20*/__u32 S_blocks_per_group;   * * Block number per block group/__u32 S_frags_per_group;  /* Can ignore * * __u32 S_inodes_per_group;             * * Number of INODE per block group/__u32 S_mtime;             /* Mount Time * */*30*/__u32 s_wtime;         /* Write Time * * __u16 s_mnt_count;     /* Mount count */__S16 S_max_mnt_count;             /* Maximal mount Count */__u16 s_magic;             /* Magic Signature */__u16 s_state;            /* File System State */__U16 s_errors; /* behaviour when detecting errors* * __u16 S_minor_rev_level;         /* Minor revision level * */*40*/__u32 S_lastcheck;     /* Time of the last check */__u32 s_checkinterval; /* max.        Time Between Checks * * __U32 s_creator_os;         /* Can ignore * * __u32 s_rev_level;        /* Revision level * */*50*/__u16 s_def_resuid;        /* Default UID for reserved blocks * * __U16 s_def_resgid;         /* Default GID for reserved blocks * * __U32 S_first_ino;        /* The non-reserved inode * * __u16 s_inode_size;    /* Size of inode structure */__U16 S_BLOCK_GROUP_NR;    /* Block Group # of this superblock * * __u32 S_feature_compat;  /* Compatible feature set *//*60*/__u32 S_feature_incompat; /* Incompatible feature set */__u32 S_feature_ro_compat;          /* readonly-compatible Feature set *//*68*/__u8 s_uuid[16];   /* 128-bit UUID for volume *//*78*/char s_volume_name[16];  /* Volume name *//*88*/char s_last_mounted[64]; /* Directory wheRe last mounted * */*c8*/__u32 S_algorithm_usage_bitmap;        /* Can ignore * * __u8 s_prealloc_blocks;    /* Can ignore * * __u8 s_prealloc_dir_blocks;               /* Can ignore * * __u16 s_padding1; /* Can ignore * */*d0*/__u8 s_journal_uuid[16];     /* UUID of journal Superblock * */*e0*/__u32 s_journal_inum;      /* The inode numbers of the log file * * * __u32 S_journal_dev;      /* The device number of the log file */__u32 S_last_orphan;    /* Start of the list of inodes to delete *//*ec*/__u32 s_reserved[197];
 /* Can ignore * *;

We can see that the Super block has 1024 bytes so big. In Super block, our first field to care about is the magic signature, and for ext2 and ext3 file systems, the value of this field should be exactly equal to 0xef53. If not, then this hard disk partition is definitely not a normal ext2 or ext3 file system. From here, we can also estimate that the compatibility of ext2 and ext3 must be strong, otherwise, the Linux kernel developer should choose a magic signature for the Ext3 file system.

Another important field in Super Block is s_log_block_size. From this field, we can derive the real block size. We take the real block size as B,b = 1 << (s_log_block_size + 10), the unit is bytes. For example, if this field is 0, then the block size is 1024 bytes, which is exactly the smallest block size, and if this field is 2, then the block size is 4096 bytes. From here we get the very important data of block size.

3.2 Group Descriptors

Let's go down and look at a bunch of group descriptors behind Super block. First notice that the Super block is starting at byte 1024, with a total of 1024 bytes so large. Group descriptors begins with the first block behind Super block. In other words, if super block is in block 0, then group descriptors starts with block 1, and if Super block is in Block 1, then group descriptors is from bloc K 2 begins. Because the super block is only 1024 bytes so large, it will not go beyond the bounds of a block. If a block is exactly the size of 1024 bytes, we see that group descriptors is immediately behind Super block, leaving no gaps. And if a block is 4096 bytes, there is a gap between group descriptors (starting with byte 4096) and the end of Super Block (4096-2048 bytes).

So the number of block group on the hard disk partition, or how many group descriptors there are, we need to find the answer in Super block. The S_blocks_count in the Super Block records the total number of blocks on the hard disk partition, while the S_blocks_per_group records how many block is in each group. Obviously, the number of block groups on the filesystem, we'll write it as G,g = (s_blocks_count-s_first_data_block-1)/S_blocks_per_group + 1. Why subtract S_first_data_block because the S_blocks_count is the number of blocks on the hard disk partition, and the block before S_first_data_block is not owned by the Block group, so of course subtract 。 Finally, why add one, this is because the tail may be more blocks, these blocks we want to put it in a relatively small group inside.

Note that all of these group descriptors on the hard disk partition can be plugged into a block. That is to say Groups_count * descriptor_size must be less than or equal to block_size.

Knowing how many block group blocks there were on the hard disk partition, we could read so many group descriptors. Let's look at what group descriptor looks like.

struct Ext3_group_desc
{
 __u32 bg_block_bitmap;      /* Block Pointer point to block bitmap
 /__u32 bg_inode_bitmap;      /* block pointer to inode bitmap *
 /__u32 bg_inode_table;       /* block pointer to inodes table
 /__u16 bg_free_blocks_count/* Idle blocks count
 /__u16 bg_free_inodes_count; * Idle I Nodes Count * *
 __u16 bg_used_dirs_count;   /* Directory Count * *
 __u16 bg_pad;               /* Can ignore * *
 __u32 bg_reserved[3];       /* Can ignore * *
;

Each group descriptor is as big as bytes. From above, we see three key block pointers, three key block pointers, which we've already mentioned earlier.

3.3 Inode

Now that we're ready, we can finally start reading the file. The first thing to read, of course, is the root directory of the file system. Note that this so-called root directory is not necessarily the root directory on the entire Linux operating system, relative to this filesystem or hard disk partition. This root directory is stored in a fixed inode, which is the Inode 2 on the file system. It is necessary to mention that the inode count is the same as the block count and is global in nature. The special note here is that the inode count starts at 1, and we mentioned earlier that the block count is starting at 0, and this difference is especially noticeable when developing a program. (This strange inode counting method has been a big headache for the author.) )

So, let's take a look at how to read the user data in this inode after we get an inode number. There is a field in Super block that s_inodes_per_group the number of inode in each block group. With our inode numbers divided by S_inodes_per_group, we know which block group the inode we want is in, and the remainder of the division also tells us that the inode we want is the one in this block group The first few inode; then, we can find this block group descriptor, from this descriptor, we find this group's inode table, and then from the Inode table we want to find the first few I node, and then we can start reading the user data in the inode.

The formula is this: Block_group = (ino-1)/S_inodes_per_group. Here ino is our inode numbered. offset = (ino-1)% S_inodes_per_group, this offset indicates that the inode we want is the first inode within the block group.

After finding the inode, let's look at what the inode is like.

struct Ext3_inode {__u16 i_mode;     /* File mode * * __u16 i_uid;    /* Low bits of Owner Uid */__u32 i_size;   /* File size, Unit is byte */__u32 I_atime;   /* Access Time * * __U32 i_ctime;   /* Creation Time * * __U32 i_mtime;   /* Modification Time * * __U32 i_dtime;     /* Deletion time * * __u16 i_gid;          /* Low bits of Group Id */__u16 I_links_count;               /* Links count */__u32 i_blocks;                /* Blocks Count * * __u32 i_flags;          * * File flags/__U32 l_i_reserved1; /* Can ignore * * __u32 i_block[ext3_n_blocks];           /* A group of block pointers * * __U32 i_generation;             /* Can ignore * * __u32 I_file_acl;              /* Can ignore * * __u32 I_dir_acl;                /* Can ignore * * __u32 i_faddr;               /* Can ignore * * __u8 L_i_frag;              /* Can ignore * * __u8 l_i_fsize;                 /* Can ignore * * __u16 i_pad1;           /* Can ignore * * __u16 L_i_uid_high;           /* Can ignore * * __u16 L_i_gid_high;          /* Can ignore * * __u32 l_i_reserved2; /* Can ignore * *;
 

We see that there are so many block pointers in the inode that ext3_n_blocks (= 15) can be stored. User data is obtained from these blocks. 15 blocks does not necessarily put all the user data, where the Ext3 file system takes a layered structure. The first 12 of the 15 block pointers are called direct blocks, where the user data is stored directly. The 13th block, which is called the indirect blocks, contains all the blocks pointers, which are used to hold the user data. The 14th block is the so-called double indirect block, where all the block pointers are stored, and the block pointers to the block are all used to hold the block pointer, and the block pointers are used to Store user data. The 15th block is the so-called triple indirect block, which has a layer of block pointer over the double indirect block as mentioned above. As an exercise, the reader can calculate how many bytes of user data can be stored in an inode with such a layered structure. (Calculate whether the required information is sufficient.) Which key data are missing. )

How many blocks are actually in an inode, which is computed by the Inode field i_size. I_size records the actual size of the file or directory, and by dividing its value by the block size, you can conclude that the inode occupies a total of several blocks. Notice the I_blocks field above, the careless reader may think that this field records how many blocks are actually used in an inode. So what does this field do, reader friends can take this opportunity to experience the fun of reading the Linux kernel source code. ;-)

3.4 directory structure of the file system

Now that we can read the contents of the inode, we're going to read the contents of the files and directories on the file system. Read the contents of the file, as long as the corresponding inode content read out of the line, and the directory is just a fixed format file, the file in a fixed format records of what files in the directory, as well as their file name, and inode numbers and so on.

struct Ext3_dir_entry_2 {
 __u32 inode;    /* Inode numbers *
 /__u16 Rec_len;  /* Directory Entry length  /__u8 name_len/* Name length * *
 __u8  file_type;
 Char  Name[ext3_name_len];//* File name
/};

The Ext3_name_len used above is 255. Note that the DIR entry on the hard disk partition is not fixed length, and the length of each dir entry is recorded by the Rec_len field above.




Back to the top of the page


Summary

With this information, we can read the entire contents of a ext3 file system. If the reader has experience with Windows Driver development, from the information in this article, it is possible to develop a read-only ext3 file system under Windows. But to read and write, you need to understand the structure of the Ext3 log file, and this article is limited to space and does not include this content.


Resources

1 Remy card, Theodore Ts ' O, Stephen Tweedie, design and implementation of the Second Extended filesystem, HTTP://WEB.MIT.E Du/tytso/www/linux/ext2intro.html

2 Linux Kernel 2.4.18 Source Code, http://lxr.linux.no/source/fs/ext3/

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.