Linux kernel DM Thin pool analysis

Source: Internet
Author: User

First, Introduction

Docker is now very hot, as one of the Docker storage engine DM thin pool used more, thin-pool by a metadata device and data equipment, thin concept is on-demand data block, delete also collect data block, Increase storage space utilization.

A) The Dm thin Pool creation command is as follows:

Clear 0 Metadata The first page of the device, why?? Because the first page is the superblock of Thin-pool, the thin pool kernel module is determined by determining whether the superbock is full or not to reformat a thin-pool or to open an existing Thin-pool

DD If=/dev/zero of= $metadata _dev bs=4096 count=1

Create pool,--table parameters, 0 is the starting sector, 20971520 is the number of sectors, where the sector is 512 bytes to calculate, data_block_size---data block size, unit or sector, 512 bytes, data_block_ Size minimum is 128, maximum is 2097152

Low_water_mark---When the number of free data blocks is less than Water_mark, the DM thin-pool kernel module sends a notification event to the user-configured daemon.

Dmsetup Create pool \

b)--table "0 20971520 thin-pool $metadata _dev $data _dev \

    1. $data _block_size $low _water_mark "

Call the message interface of the pool device to create the thin device, where 0 is the ID of the thin device, because a pool can create n thin devices, so an ID is required to differentiate these thin devices.

Dmsetup message/dev/mapper/pool 0 "Create_thin 0"

Create a/dev/dm-x device file for the thin device above.

Dmsetup Create thin--table "0 2097152 thin/dev/mapper/pool 0"

Second, Thin-pool disk layout

A) data device

Data devices are data_block_size, if the 128 is 128*512=64kb divided into blocks to manage, data devices are all stored in the database, there is no metadata, so the disk layout slightly.

b) Meta Equipment

c), the disk layout of the meta device is as above, the meta device has a block size of 4KB, the first block is Superblock, the code is defined as follows:

/*

Little endian On-disk superblock and device details.

*/

struct Thin_disk_superblock {

__le32 csum; /* Checksum of Superblock except for this field. */

__LE32 flags;

__le64 BLOCKNR; /* This block number, dm_block_t. * *

__u8 uuid[16];

__le64 Magic;

__LE32 version;

__le32 time;

__le64 trans_id;

/*

Root held by userspace transactions.

*/

__le64 Held_root;

__u8 Data_space_map_root[space_map_root_size];

__u8 Metadata_space_map_root[space_map_root_size];

/*

2-level btree Mapping (dev_id, (dev block, time)), data block

*/

__le64 Data_mapping_root;

/*

Device Detail Root mapping dev_id-device_details

*/

__le64 Device_details_root;

__le32 data_block_size; /* in 512-byte sectors. */

__le32 metadata_block_size; /* in 512-byte sectors. */

__le64 metadata_nr_blocks;

__le32 Compat_flags;

__le32 Compat_ro_flags;

__le32 Incompat_flags;

} __packed;

D) Important fields:

L magic:27022010 of Magic:pool equipment

L Data_space_map_root: Map_root structure for managing data block devices using space (using bitmaps)

L Metadata_space_map_root: Map_root structure for managing metadata block devices using space (using bitmaps)

L DATA_MAPPING_ROOT:THIN The mapping of the device from the virtual block address to the real address on the data block device Btree root block number

L Device_details_root: Information for all thin devices is stored btree to the root block of the device_details_root point

L Data_block_size: Data device block size = Data_block_size * 512B

L Meta_nr_blocks: Metadata device total block, note metadata device block size is 4KB

Third, addressing

From the virtual block address of the thin device to the actual block address of the data device, the addressing logic of the Thin-pool is as follows:

A) First use the thin device's dev_id as the key, from the Data_mapping_root point to the block contained in the Btree root node, to find the thin device corresponding to the data map Btree block number

b) Then use the virtual block number as key from the thin device data mapping Btree to find the actual value, the value is the data block device corresponding to the actual block number

c) The entire process with pseudo-code as follows:

block_t Find_thin_block (dev_t dev_id, block_t block)

{

block_t Thin_map_root;

Find the corresponding value from the Data_map_root btree with the dev_id key, and the result is the block where the dev_id corresponding thin device's root node is located

Thin_map_root = Btree_lookup (Pool_data_map_root, dev_id);

Read the root node from the meta device

Read_block (Meta_dev, Thin_map_root, &thin_map_tree);

Find the actual block corresponding to the data device from the Btree in thin virtual block blocks

Return Btree_lookup (&thin_map_tree, block);

}

Iv. Space Management

A) throughout the Thin-pool module, all space management is based on Btree.

b) Metadata block device itself space management: This is through the metadata_space_map_root as the root block of the btree to manage, leaf node storage is the bitmap of the management space, and the ordinary bitmap is different from the Thin-pool bitmap is a unit 2 bits, Altogether can represent 4 states: 0 is idle, 1 is a reference count of 1, the 2-bit reference count is 2, 3 represents a reference count >2, and the actual reference count needs to be found from another btree.

c) data block device space management: similar to the metadata block device itself, the data block device is also a btree, and the leaf node points to the block containing the management bitmap, of course, these blocks are on the metadata device.

d) Space release: When the reference bitmap in the reference count becomes 0 o'clock, the block becomes idle, which is the meaning of the thin device, however, this requires the upper file system discard support, Thin-pool need to know which block upper layer does not need, you can reduce the bitmap reference count, Release The block.

V. Summary

Thin-pool storage format is still relatively clear and simple, the overall view of the above layout structure diagram is already very clear.

Linux kernel DM thin pool analysis

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.