Docker Foundation Technology: Devicemapper

Source: Internet
Author: User

Device Mapper Introduction

Devicemapper since Linux 2.6 was introduced into Linux as one of the most important technologies. It supports a generic device mapping mechanism for logical volume management in the kernel, which provides a highly modular kernel architecture for implementing block device drivers for storage resource management, which contains three important object concepts, Mapped device, Mapping Table, and Target device.

Mapped device is a logical abstraction that can be understood as a logical device that is provided out of the kernel, and it is mapped by the Mapping Relationship mapping table describes and the Target Device. Target device represents the physical space segment mapped by Mapped device, which is the physical device that the logical device is mapped to for the logical device represented by Mapped device.

Mapping table contains information such as the starting address, range, and address offset of the mapped device logic and the target type (note: These addresses and offsets are in the sector of the disk, that is, 512 byte size, so when you see 128, it actually means 128*512=64k).

Logical devices in Devicemapper the mapped device not only maps one or more physical device target devices, but also maps another mapped device, which makes up an iterative or recursive situation. Just like a directory in a file system can have a directory in addition to files, theoretically can be nested indefinitely.

Devicemapper in the kernel through a modular Target Driver plug-in implementation of the IO request filtering or redirection, such as the current implementation of the plug-ins include soft Raid, encryption, multipath, mirroring, snapshots, etc., which embodies the Linux The principle of separation of policies and mechanisms in the kernel design. As shown in. , we can see that devicemapper is just a framework in which we can insert a variety of strategies (let me not naturally think of the object-oriented strategy model), in the many "plugins", there is a thing called thin Provisioning Snapshot, which is the most important module in Docker's use of Devicemapper .

Image source: http://people.redhat.com/agk/talks/FOSDEM_2005/

Thin Provisioning Introduction

Thin provisioning how to translate into Chinese, it is really a headache, I do not translate. This technology is one of the virtualization technologies. What does it mean? you can think of the memory management in our computer--"virtual memory Technology"--the operating system gives each process n more than the use of the address (32 bits, each process can have up to 2GB of memory space), but we know that the physical memory is not so much, If we play by the process memory and the physical memory one by one mapping, then how much physical memory we have to have. So, the operating system introduces the design of virtual memory, meaning that I logically give you an unlimited amount of memory, but in fact is accountable , because I know you must not use so much, so, to achieve the effect of increased memory utilization. (A lot of the so-called virtualization in cloud computing today is all about thin provisioning technology that is similar to "virtual memory", so-called over-provisioning, or oversold)

Well, the topic pulls back, what we're talking about here is storage. Look at the following two figures (picture source), the first is the Fat Provisioning, the second is thin Provisioning, which is a good illustration of how it is (and virtual memory is a concept)

So, how does Docker use thin provisioning technology to do layered images like unionfs? The answer is that Docker uses thin provisioning's snapshot technology. Let's introduce the snapshot of thin provisioning.

Thin Provisioning Snapshot Demo

Below, we use a series of commands to demonstrate how the thin Provisioning snapshot of device mapper is played.

First, we need to build two files, one is data.img, the other is meta.data.img:

~hchen$ sudo dd if=/dev/zero of=/tmp/data.img bs=1k count=1 seek=10m1+0 Records in1+0 Records out1024 bytes (1.0 kB) copie D, 0.000621428 s, 1.6 mb/s~hchen$ sudo dd if=/dev/zero of=/tmp/meta.data.img bs=1k count=1 seek=1g1+0 Records in1+0 record S out1024 bytes (1.0 kB) copied, 0.000140858 s, 7.3 mb/s

Note the option in the command seek , which is expressed as the Skip of option specifies the output file for the first 10G outputs of the bloksize space after the content is written. Because BS is 1 bytes, so that is the size of 10G, but in fact, there is no space on the hard disk, Occupy space only 1k content. When the content is written to, it is allocated space on the hard disk. We can use the LS command to see the actual distribution of 12K and 4K.

~hchen$ sudo ls-lsh/tmp/data.img12k-rw-r--r--. 1 root root 11G 23:01/tmp/data.img~hchen$ sudo ls-slh/tmp/meta.data.img4.0k-rw-r--r--. 1 root root 101M 23:17/tmp/meta.data.img

Then, we create a loopback device for this file. (Loop2015 and loop2016 are the two names that I randomly take)

~hchen$ sudo losetup/dev/loop2015/tmp/data.img~hchen$ sudo losetup/dev/loop2016/tmp/meta.data.img~hchen$ sudo LOSETUP-A/DEV/LOOP2015: [64768]:103991768 (/TMP/DATA.IMG)/dev/loop2016: [64768]:103991765 (/TMP/META.DATA.IMG)

Now, let's build a pool of thin provisioning for this device, with the Dmsetup command:

~hchen$ sudo dmsetup create hchen-thin-pool                   --table "0 20971522 thin-pool/dev/loop2016/dev/loop2015                            128 65536 1 Skip_block_zeroing "

The parameters are explained below (more information can be found on the man page of Thin provisioning):

    • Dmsetup Create is the command for creating thin pool
    • Hchen-thin-pool is a custom pool name, no conflict is better.
    • –table is the parameter setting for this pool
      • Sector position represented by 0
      • 20971522 code The sector number of the sentence, said earlier, a sector is 512 bytes, so, 20,971,522 is exactly 10GB
      • /DEV/LOOP2016 is a meta file device (we've built it earlier)
      • /DEV/LOOP2015 is the device for the data file (we built it earlier)
      • 128 is the smallest assignable number of sector
      • 65536 is the least usable sector water mark, which is a threshold
      • 1 represents an additional parameter
      • Skip_block_zeroing is an additional parameter, which indicates that a block with 0 padding is skipped

Then we can see a device mapper:

~hchen$ sudo ll/dev/mapper/hchen-thin-poollrwxrwxrwx. 1 root root 7 23:24/dev/mapper/hchen-thin-pool. /dm-4

Next, we're not done with the initial, and we're going to create a thin Provisioning Volume:

~hchen$ sudo dmsetup message/dev/mapper/hchen-thin-pool 0 "Create_thin 0" ~hchen$ sudo dmsetup create hchen-thin-volumn-0             --table "0 2097152 thin/dev/mapper/hchen-thin-pool 0"

which

    • The Create_thin in the first command is the keyword, and the following 0 indicates the ID of the volume device
    • The second command, is really for this volumn create a can mount device, called hchen-thin-volumn-001. 2,097,152 only 1GB

Well, in front of Mount, we're going to format it:

~hchen$ sudo mkfs.ext4/dev/mapper/hchen-thin-volumn-001mke2fs 1.42.9 (28-dec-2013) discarding device blocks: Donefilesystem label=os type:linuxblock size=4096 (log=2) Fragment size=4096 (log=2) stride=16 blocks, Stripe width=16 blo cks65536 inodes, 262144 blocks13107 blocks (5.00%) reserved for the Super Userfirst data block=0maximum filesystem blocks= 2684354568 block groups32768 blocks per group, 32768 fragments per group8192 inodes per groupsuperblock backups stored on blocks:32768, 98304, 163840, 229376Allocating group tables:donewriting inode tables:donecreating Journal (8192 blocks): Donewriting superblocks and filesystem accounting Information:done

Well, we can mount it (in the following command, I also created a file)

~hchen$ sudo mkdir-p/mnt/base~hchen$ sudo mount/dev/mapper/hchen-thin-volumn-001/mnt/base~hchen$ sudo echo "Hello Worl D, I am a base ">/mnt/base/id.txt~hchen$ sudo cat/mnt/base/id.txthello World, I am a base

Okay, next, let's see how snapshot is doing:

~hchen$ sudo dmsetup message/dev/mapper/hchen-thin-pool 0 "Create_snap 1 0" ~hchen$ sudo dmsetup create Mysnap1                    --tabl E "0 2097152 thin/dev/mapper/hchen-thin-pool 1" ~hchen$ sudo ll/dev/mapper/mysnap1lrwxrwxrwx. 1 root root 7 23:49/dev/mapper/mysnap1. /dm-5

In the above command:

    • The first command is to send a CREATE_SNAP message to Hchen-thin-pool, followed by two IDs, the first is the new dev ID, and the second one is to do snapshot from which existing dev ID (0 This dev ID was created earlier)
    • The second command is to create a MYSNAP1 device and can be mount.

Let's take a look at the following:

~hchen$ sudo mkdir-p/mnt/mysnap1~hchen$ sudo mount/dev/mapper/mysnap1/mnt/mysnap1~hchen$ sudo ll/mnt/mysnap1/total 20 -rw-r--r--. 1 root 23:46 id.txtdrwx------. 2 root root 16384 23:43 lost+found~hchen$ sudo cat/mnt/mysnap1/id.txthello world, I am a base

Let's revise the/mnt/mysnap1/id.txt and add a snap1.txt file:

~hchen$ sudo echo "I am snap1" >>/mnt/mysnap1/id.txt~hchen$ sudo echo "I am Snap1" >/mnt/mysnap1/snap1.txt~hche n$ sudo cat/mnt/mysnap1/id.txthello world, I am a Basei am snap1~hchen$ sudo cat/mnt/mysnap1/snap1.txti am snap1

Let's take a look at/mnt/base and you'll find nothing different:

~hchen$ sudo ls/mnt/baseid.txt      lost+found~hchen$ sudo cat/mnt/base/id.txthello world, I am a base

Have you seen what a layered image looks like?

Would you like to continue to build a snapshot on the snapshot just now?

~hchen$ sudo dmsetup message/dev/mapper/hchen-thin-pool 0 "Create_snap 2 1" ~hchen$ sudo dmsetup create mysnap2                    --tabl E "0 2097152 thin/dev/mapper/hchen-thin-pool 2" ~hchen$ sudo ll/dev/mapper/mysnap2lrwxrwxrwx. 1 root root 7 23:52/dev/mapper/mysnap1. /dm-7~hchen$ sudo mkdir-p/mnt/mysnap2~hchen$ sudo mount/dev/mapper/mysnap2/mnt/mysnap2~hchen$ sudo  ls/mnt/ Mysnap2id.txt  Lost+found  

Well, I'm sure you see the layered mirror look.

After reading the demo, let's Make some theoretical knowledge:

    • Snapshot comes from the LVM (Logic volumn Manager), which can take a snapshot of a device without interrupting the service.
    • Snapshot is Copy-on-write, that is, the corresponding memory will be copied only if changes have occurred.

In addition, here is an article storage thin provisioning benefits and challenges can go to the first read.

Docker's Devicemapper

The above is basically Docker's gameplay, we can look at the Docker loopback device:

~hchen $ sudo losetup-a/dev/loop0: [64768]:38050288 (/var/lib/docker/devicemapper/devicemapper/data)/DEV/LOOP1: [ 64768]:38050289 (/var/lib/docker/devicemapper/devicemapper/metadata)

where Data 100gb,metadata 2.0GB

The following are the relevant thin-pool. Among them, there is a large string of hash device is the container that is being started:

~hchen $ sudo ll/dev/mapper/dock*lrwxrwxrwx. 1 root root 7 07:57/dev/mapper/docker-253:0-104108535-pool. /dm-2lrwxrwxrwx. 1 root root 7 11:13/dev/mapper/docker-253:0-104108535- Deefcd630a60aa5ad3e69249f58a68e717324be4258296653406ff062f605edf. /dm-3

We can take a look at its device ID (Docker has written them down):

~hchen $ sudo cat/var/lib/docker/devicemapper/metadata/ deefcd630a60aa5ad3e69249f58a68e717324be4258296653406ff062f605edf{"device_id": "Size": 10737418240, " transaction_id ":", "initialized": false}

DEVICE_ID is 24,size is 10737418240, divided by 512, is 20,971,520 sector, we use this information to make a snapshot look (note: I used a relatively large dev id–1024):

~hchen$ sudo dmsetup message "/dev/mapper/docker-253:0-104108535-pool" 0                                     "create_snap 1024x768" ~hchen$ sudo dmsetup Create Dockersnap--table                     "0 20971520 thin/dev/mapper/docker-253:0-104108535-pool 1024x768" ~hchen$ sudo mkdir/mnt/ docker~hchen$ sudo mount/dev/mapper/dockersnap/mnt/docker/~hchen$ sudo ls/mnt/docker/id lost+found rootfs~hchen$ sudo Ls/mnt/docker/rootfs/bin Dev etc home lib lib64 lost+found media mnt opt proc root run sbin SRV sys tmp usr var

We can also see the relevant mount situation in the Docker container with the FINDMNT command (because it is too long, just the abstract):

# findmnttarget                SOURCE               /                 /dev/mapper/docker-253:0-104108535-deefcd630a60[/rootfs]/etc/resolv.conf  /dev/mapper/centos-root[/var/lib/docker/containers/deefcd630a60/resolv.conf]/etc/hostname     /dev/mapper /centos-root[/var/lib/docker/containers/deefcd630a60/hostname]/etc/hosts        /dev/mapper/centos-root[/var/lib/ Docker/containers/deefcd630a60/hosts]
is Device mapper OK?

Thin Provisioning's document says this also deals with the experimental phase, not on production.

These targets is very much still in the experimental state. Yet rely on them in production.

In addition, Jeff Atwood on Twitter to send such a push

The point of this discussion, which points to this code diff, basically says that Devicemapper is too much of a problem and we should blacklist it. Doker's founder also replied to:

So, if you're using Loopback's devicemapper, the correct solution is when your storage is having problems:

Rm-rf/var/lib/docker

Article turned from: http://coolshell.cn/articles/17200.html

Docker Foundation Technology: Devicemapper

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.