[Ext4] 12 allocation mechanism-key data structure

Source: Internet
Author: User
& Amp; 65279; & amp; 65279; several major data structures are involved in the block allocation mechanism. Use ext4_allocati

The block allocation mechanism involves several major data structures.

Use ext4_allocation_request to describe the block request, and then decide whether to perform the block allocation operation based on the block search result, that is, the upper-layer requirement.

In the allocation process, in order to better execute the allocation and record some information, you need to describe the allocation behavior, there is a struct ext4_allocation_contex.

Pre-allocated space may be used during the search for available space. Therefore, a descriptor ext4_prealloc_space that can describe the pre-allocated space and other attributes is required.

Next, we will analyze each key struct in detail.

1. Block request descriptor ext4_allocation_request

Block allocation request attribute, which is described by the request descriptor ext4_allocation_request:

Structext4_allocation_request {

/* Target inode for block we 'reallocating */

Struct inode * inode;

/* How many blocks we want to allocate */

Unsigned int len;

/* Logical block in target inode */

Ext4_lblk_t logical;

/* The closest logical allocated blockto the left */

Ext4_lblk_t lleft;

/* The closest logical allocated blockto the right */

Ext4_lblk_t lright;

/* Phys. target (a hint )*/

Ext4_fsblk_t goal;

/* Phys. block for the closest logicalallocated block to the left */

Ext4_fsblk_t pleft;

/* Phys. block for the closest logicalallocated block to the right */

Ext4_fsblk_t pright;

/* Flags. see above EXT4_MB_HINT _**/

Unsigned int flags;

};

The request descriptor struct is initialized in ext4_ext_map_blocks () (Note: ext4_ext_map_blocks () is used to find or allocate the specified block and map it to the cache space ).

The above information is an analysis of the goal value of a member variable. The goal record is a physical block number, and its implication is important:Although the goal only records the physical block number, the choice of this physical block number can largely be that the file ensures the locality feature and its physical address continuity.

Goal is defined by the ext4_ext_find_goal () function:

Static ext4_fsblk_t ext4_ext_find_goal (struct inode * inode,

Struct ext4_ext_path * path,

Ext4_lblk_t block)

{

If (path ){

Intdepth = path-> p_depth;

Structext4_extent * ex;

/*

* Try to predict block placement assuming thatwe are

* Filling in a file which will eventually be

* Non-sparse --- I. e., in the case of libbfdwriting

* An ELF object sections out-of-order but in away

* The eventually results in a contiguousobject or

* Executable file, or some database extendinga table

* Space file. However, this is actually somewhat

* Non-ideal if we are writing a sparse filesuch

* Qemu or KVM writing a raw image file that isgoing

* To stay fairly sparse, since it will end up

* Fragmenting the file system's free space. Maybe we

* Shoshould have some hueristics or some way toallow

* Userspace to pass a hint to file system,

* Especially if the latter case turns out tobe

* Common.

*/

Ex = path [depth]. p_ext;

If (ex ){

Ext4_fsblk_text_pblk = ext4_ext_pblock (ex );

Ext4_lblk_text_block = le32_to_cpu (ex-> ee_block );

 

If (block> ext_block)

Returnext_pblk + (block-ext_block );

Else

Returnext_pblk-(ext_block-block );

}

/* It looks like index is empty;

* Try to find starting block from index itself */

If (path [depth]. p_bh)

Returnpath [depth]. p_bh-> B _blocknr;

}

/* OK. use inode's group */

Returnext4_inode_to_goal_block (inode );

}

Analyze the code in detail. if the path from the root directory to the specified logical block exists, you need to calculate the address of the target physical block based on the path.

(1) if the end of Path is dataextentThe path is from the root to the leaf. When the requested block number is greater than the start logical block number of the path leaf extent ext_block (corresponding to the physical block number pblk), the logical block distance is (block-ext_block ), to ensure the continuity of the corresponding physical address at the most possible level, you only need to return the idle physical block closest to the pblk + (block-ext_block) physical block number; when the request block number is smaller than extent's start logical block number ext_block, you only need to use pblk-(ext_block-block) as much as possible) the physical block number is used to find the idle physical block closest to the physical address. Therefore, weSpecify the goal as pblk + (block-ext_block) and pblk-(block-ext_block) respectively).

(2) IfPath exists, but no leavesThat's easy. we only need to set the goal physical block number.Specifies the physical block number corresponding to the last extent block.Yes.

(3) there is another case where no path is provided. I personally think that this scenario is just the case of inode create. There is a dedicated ext4_inode_to_goal_block () for implementation:

Ext4_fsblk_t ext4_inode_to_goal_block (struct inode * inode)

{

Structext4_inode_info * ei = EXT4_ I (inode );

Ext4_group_tblock_group;

Ext4_kgblk_tcolour;

Intflex_size = ext4_flex_bg_size (EXT4_SB (inode-> I _sb ));

Ext4_fsblk_tbg_start;

Ext4_fsblk_tlast_block;

Block_group = ei-> I _block_group;

If (flex_size> = EXT4_FLEX_SIZE_DIR_ALLOC_SCHEME ){

/*

* If there are at leastEXT4_FLEX_SIZE_DIR_ALLOC_SCHEME

* Block groups per flexgroup, reserve thefirst block

* Group for directories and special files. Regular

* Files will start at the second blockgroup. This

* Tends to speed up directory access andimproves

* Fsck times.

*/

Block_group & = ~ (Flex_size-1 );

If (S_ISREG (inode-> I _mode ))

Block_group ++;

}

Bg_start = ext4_group_first_block_no (inode-> I _sb, block_group );

Last_block = ext4_blocks_count (EXT4_SB (inode-> I _sb)-> s_es)-1;

/*

* If we are doing delayed allocation, we don't need to take

* Color into account.

*/

If (test_opt (inode-> I _sb, DELALLOC ))

Returnbg_start;

If (bg_start + EXT4_BLOCKS_PER_GROUP (inode-> I _sb) <= last_block)

Color = (current-> pid % 16 )*

(EXT4_BLOCKS_PER_GROUP (inode-> I _sb)/16 );

Else

Color = (current-> pid % 16) * (last_block-bg_start)/16 );

Returnbg_start + color;

}

The idea is: if flex_size has at least EXT4_FLEX_SIZE_DIR_ALLOC_SCHEME block groups, the first available block of the second block group in the flex_group where inode is located is defined as the starting physical block number bg_block.

Of course, if all the files in the flex_group use bg_block as the goal, there will certainly be competition. Therefore, the purpose of adding color is to add a random value to reduce the potential competition.

Therefore, in this caseGoal selects a random value in flex_group where inode is located.

[Note: If flex_size is not smaller than EXT4_FLEX_SIZE_DIR_ALLOC_SCHEME, the first group in flex_group may be separated to store directories and some special files, common files are allocated from the second group. this feature can accelerate directory access and fsync efficiency.]

2. allocate the behavior descriptor ext4_allocation_contex

In the allocation process, in order to better execute the allocation and record some information, you need to describe the allocation behavior, there is a struct ext4_allocation_contex:

Struct ext4_allocation_context {

Struct inode * ac_inode;

Struct super_block * ac_sb;

/* Original request */

Struct ext4_free_extent ac_o_ex;

/* Goal request (normalized ac_o_ex )*/

Struct ext4_free_extent ac_g_ex;

/* The best found extent */

Struct ext4_free_extent ac_ B _ex;

/* Copy of the best found extent takenbefore preallocation efforts */

Struct ext4_free_extent ac_f_ex;

_ 2010ac_groups_scanned;

_ 2010ac_found;

_ 2010ac_tail;

_ 2010ac_buddy;

_ 2010ac_flags;/* allocation hints */

_ U8 ac_status;

_ U8 ac_criteria;

_ U8 ac_2order;/* if request is to allocate 2 ^ N blocks and

* N> 0, the field stores N, otherwise 0 */

_ U8 ac_op;/* operation, for history only */

Struct page * ac_bitmap_page;

Struct page * ac_buddy_page;

Struct ext4_prealloc_space * ac_pa;

Struct ext4_locality_group * ac_lg;

};

This data structure is used to describe the attributes of the allocation context. The ext4_allocation_request struct is initialized by the ext4_mb_initialize_context () function.

Ext4_mb_initialize_context (): initialize ac-> ac_o_ex by using the request descriptor information: group where the applied logical block number fe_logical and goal are located, the cluster number of goal (temporarily understood as the physical block number), and then assign ac_g_ex to ac_o_ex.

Ext4_mb_normalize_request:

1. calculate the file size. The size should be I _size_read (ac-> ac_inode) and offset (request length), where offset is converted from a specified block.

2. estimate the possible file size based on the specified algorithm;

# Define NRL_CHECK_SIZE (req, size, max, chunk_size )\

(Req <= (size) | max <= (chunk_size ))

/* First, try to predict filesize */

/* XXX: shocould this table be tunable? */

Start_off = 0;

If (size <= 16*1024 ){

Size = 16*1024;

} Else if (size <=32*1024 ){

Size = 32*1024;

} Else if (size <= 64*1024 ){

Size = 64*1024;

} Else if (size <= 128*1024 ){

Size = 128*1024;

} Else if (size <= 256*1024 ){

Size = 256*1024;

} Else if (size <= 512*1024 ){

Size = 512*1024;

} Else if (size <= 1024*1024 ){

Size = 1024*1024;

} Else if (NRL_CHECK_SIZE (size, 4*1024*1024, max, 2*1024 )){

Start_off = (loff_t) ac-> ac_o_ex.fe_logical>

(21-bsbits) <21;

Size = 2*1024*1024;

} Else if (NRL_CHECK_SIZE (size, 8*1024*1024, max, 4*1024 )){

Start_off = (loff_t) ac-> ac_o_ex.fe_logical>

(22-bsbits) <22;

Size = 4*1024*1024;

} Else if (NRL_CHECK_SIZE (ac-> ac_o_ex.fe_len,

(8 <20)> bsbits, max, 8*1024 )){

Start_off = (loff_t) ac-> ac_o_ex.fe_logical>

(23-bsbits) <23;

Size = 8*1024*1024;

} Else {

Start_off = (loff_t) ac-> ac_o_ex.fe_logical <bsbits;

Size = ac-> ac_o_ex.fe_len <bsbits;

}

Size = size> bsbits;

Start = start_off> bsbits;

It can be seen that the size and start obtained after the estimated file size are certainly larger than the original one.

3. check whether the existing prealloc space is overwritten. (If it is covered, it will be a BUG );

4.Update ac_g_ex: Update ac_g_ex according to the size and start in (2;

Ac-> ac_g_ex.fe_logical = start;

Ac-> ac_g_ex.fe_len = EXT4_NUM_ B2C (sbi, size );

As shown in the preceding figure, the ac-> ac_g_ex member is updated through the ext4_mb_normalize_request () function.

While ac-> ac_ B _ex is initialized in the ext4_mb_regular_allocator () function, which indicates the optimal extent that can be allocated. the implicit meaning is that the distribution is as follows.

Ac-> ac_f_ex is a copy of ac_ B _ex that is retained before prealloc space initialization. it is defined in ext4_mb_new_inode_pa () or ext4_mb_new_group_pa.

3. pre-allocated space descriptor ext4_allocation_contex

Descriptor ext4_prealloc_space that describes attributes such as pre-allocated space size:

Structext4_prealloc_space {

Struct list_head pa_inode_list;

Struct list_head pa_group_list;

Union {

Struct list_head pa_tmp_list;

Struct rcu_head pa_rcu;

} U;

Spinlock_t pa_lock;

Atomic_t pa_count;

Unsigned pa_deleted;

Ext4_fsblk_t pa_pstart;/* phys. block */

Ext4_lblk_t pa_lstart;/* log. block */

Ext4_kgblk_t pa_len;/* len of preallocated chunk */

Ext4_kgblk_t pa_free;/* howmany blocks are free */

Unsigned short pa_type;/* pa type. inode or group */

Spinlock_t * pa_obj_lock;

Struct inode * pa_inode;/* hack, for history only */

};

Four struct types are very important:

The starting logical address of the pa_lstart-> prealloc space (for files );

Pa_pstart-> The Starting physical address of the prealloc space;

Pa_len-> prealloc space length;

Pa_free-> prealloc space available length;

This struct is initialized in the ext4_mb_new_inode_pa () or ext4_mb_new_group_pa () function.

For the time being, analyze the structures.

Author: Younger Liu,

This work is licensed using the "Knowledge sharing signature"-"non-commercial use"-share the 3.0 non-localized version license agreement in the same way.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.