Docker Storage-driven overlay implementation of new mirrored storage and inode exhaustion issues

Source: Internet
Author: User
Tags diff join lstat parent directory

Image is downloaded and managed by layer, the new image downloaded files temporarily stored in the/var/lib/docker/tmp, the file name is getimageblobxxx (xxx is a bunch of random numbers), these temporary files are packaged as a layer of tar.gz and other compressed packages. The temporary file is first decompressed into a tar package that exists in the cache, and then is registered to the system using Docker\layer\layer_store.go's Layerstore register function, and the last temporary file is deleted.
In Docker\distribution\pull_v2.go:

Func (LD *v2layerdescriptor) Download (CTX context. Context, Progressoutput progress. Output) (IO. Readcloser, int64, error) {

...

    Return ioutils. Newreadcloserwrapper (Tmpfile, func () error {
        tmpfile.close ()
        //closed after deleting temporary files
        err: = OS. RemoveAll (Tmpfile.name ())
        if err! = Nil {
            Logrus. Errorf ("Failed to remove Temp file:%s", Tmpfile.name ())
        }
        return err
    }), size, nil
}

The so-called registration is to write the actual data to the file system. This process is divided into three steps:
First, create a unique directory for the mirror layer
Second, write the extracted tar package data to a unique directory in the corresponding mirror layer
Third, insert the mirrored layer object in the mirror layer management map

Func (LS *layerstore) Register (TS io. Reader, parent Chainid) (Layer, error) {Logrus. DEBUGF ("Register Parent:%s", parent) return Ls.registerwithdescriptor (TS, parent, distribution. descriptor{})} func (LS *layerstore) registerwithdescriptor (TS io. Reader, parent Chainid, descriptor distribution. Descriptor) (Layer, error) {//Err is used to hold the error which'll always trigger//cleanup of creates sour
    Ces but may isn't is an error returned//to the caller (already exists). var err error var pid string var p *rolayer if string (parent)! = "" {//here directly from map take P = ls.get ( Parent) If p = = Nil {return nil, errlayerdoesnotexist} PID = P.cacheid//Re
                Lease parent Chain if error defer func () {if err! = Nil {Ls.layerL.Lock () Ls.releaselayer (P) ls.layerL.Unlock ()} () if P.depth () >= Maxlayerd Epth{err = errmaxdepthexceeded return nil, err}} Create new Rolayer Layer: = &rolayer{parent:p, Cacheid:stringid.
        Generaterandomid (), Referencecount:1, Layerstore:ls, references:map[layer]struct{}{},  Descriptor:descriptor,}//If the parent is empty, where the PID is empty, a root directory will be built//if the root of the parent exists, then do overlay if Err = Ls.driver.Create (Layer.cacheid, PID, "", nil); Err! = Nil {return nil, err} tx, Err: = Ls.store.StartTransaction () if err! = Nil {return Nil, err} defer func () {if err! = Nil {Logrus. DEBUGF ("Cleaning up layer%s:%v", Layer.cacheid, Err) if err: = Ls.driver.Remove (Layer.cacheid); Err! = Nil {Logrus. Errorf ("Error cleaning up cache layer%s:%v", Layer.cacheid, Err)} If err: = Tx. Cancel (); Err! = Nil {Logrus. Errorf ("ErroR Canceling metadata transaction%q:%s ", TX. String (), Err)}}} ()//Apply tar package If Err = Ls.applytar (TX, TS, PID, layer);
    Err! = Nil {return nil, err} if layer.parent = = Nil {Layer.chainid = Chainid (Layer.diffid) } else {Layer.chainid = Createchainidfromparent (Layer.parent.chainID, Layer.diffid)} If Err = Storela Yer (tx, layer); Err! = Nil {return nil, err} ls.layerL.Lock () defer ls.layerL.Unlock () if existinglayer: = ls. Getwithoutlock (Layer.chainid); Existinglayer! = Nil {//Set error for cleanup, but does not return the error err = errors. New ("Layer already exists") return Existinglayer.getreference (), nil} if Err = tx. Commit (Layer.chainid); Err! = Nil {return nil, err}//Register layer Ls.layermap[layer.chainid] = layer return layer.getreference
 (), nil}

The first two steps of this process are dependent on the driver, for the overlay driver, there are two scenarios:
First, the processing of the mirror layer has a parent layer
second, the processing of the mirror layer does not have a parent layer, that is, the mirror layer is the underlying mirror layer
We first look at the initialization of the overlay driver:

Init returns the Naivediffdriver, a native diff driver for overlay filesystem. If overlay filesystem is not supported on the host, Graphdriver.
Errnotsupported is returned as error. If an overlay filesystem are not supported over a existing filesystem then error graphdriver.
Errincompatiblefs is returned. Func Init (Home string, options []string, Uidmaps, Gidmaps []idtools. Idmap) (Graphdriver. Driver, error) {if err: = Supportsoverlay (); Err! = Nil {return nil, graphdriver. errnotsupported} fsmagic, err: = Graphdriver. Getfsmagic (Home) if err! = Nil {return nil, err} if fsname, OK: = Graphdriver. Fsnames[fsmagic]; OK {backingfs = fsname} switch fsmagic {case Graphdriver. Fsmagicaufs, Graphdriver. Fsmagicbtrfs, Graphdriver. Fsmagicoverlay, Graphdriver. FSMAGICZFS, Graphdriver. Fsmagicecryptfs:logrus. Errorf ("' overlay ' isn't supported over%s", Backingfs) return nil, graphdriver.

Errincompatiblefs}    Rootuid, Rootgid, err: = Idtools.  Getrootuidgid (Uidmaps, gidmaps) if err! = Nil {return nil, err}//Create the driver home dir if ERR: = Idtools. Mkdirallas (Home, 0700, Rootuid, Rootgid); Err! = Nil &&!os. Isexist (ERR) {return nil, err} if err: = Mount. Makeprivate (home);
        Err! = Nil {return nil, err} d: = &driver{home:home, uidmaps:uidmaps, Gidmaps:gidmaps, Ctr:graphdriver. Newrefcounter (Graphdriver. Newfschecker (Graphdriver. Fsmagicoverlay)),} return naivediffdriverwithapply (d, Uidmaps, gidmaps), nil}//Naivediffdriverwithapply Retu
RNs a Naivediff driver with custom Applydiff. Func naivediffdriverwithapply (Driver applydiffprotodriver, uidmaps, Gidmaps []idtools. Idmap) Graphdriver. Driver {return &naivediffdriverwithapply{driver:graphdriver. Newnaivediffdriver (Driver, Uidmaps, gidmaps), Applydiff:driver,}}

You can see that naivediffdriverwithapply is returned, while naivediffdriverwithapply contains two objects driver and Applydiff.
Based on the Go Language feature, the first step calls

    If Err = Ls.driver.Create (Layer.cacheid, PID, "", nil); Err! = Nil {
        return nil, err
    }

Called the Docker\daemon\graphdriver\overlay\overlay.go driver in the file implements the create, so the driver's create function is called:

Create is used to create the upper, lower, and merge directories required for overlay FS for a given ID.
The parent filesystem is used-configure these directories for the overlay. Func (d *driver) Create (ID, parent, mountlabel string, storageopt map[string]string) (Reterr error) {If Len (Storageo PT)! = 0 {return FMT. Errorf ("--storage-opt is not supported for overlay")} dir: = D.dir (ID) rootuid, rootgid, err: = Idtools. Getrootuidgid (D.uidmaps, d.gidmaps) if err! = Nil {return err}//path. Dir (dir) returns all paths except the last subdirectory if err: = Idtools. Mkdirallas (path. Dir (dir), 0700, Rootuid, Rootgid); Err! = Nil {return err}//Create a unique directory for the mirror layer if err: = Idtools. Mkdiras (dir, 0700, Rootuid, Rootgid);
            Err! = Nil {return err} defer func () {//clean up on failure if reterr! = Nil { Os. RemoveAll (dir)}} ()//TopLevel images is just a "root" dir//if there is no parent layer, build one in the mirror layer directoryroot directory and returns if parent = = "" {if err: = Idtools. Mkdiras (path. Join (dir, "root"), 0755, Rootuid, Rootgid); Err! = Nil {return err} return nil}//otherwise establish directory Logrus such as upper,merged. DEBUGF ("Make Layer dir") Parentdir: = D.dir (parent)//ensure parent exists if _, err: = OS. Lstat (Parentdir); Err! = Nil {return ERR}//If parent has a root, just does an overlay to it//if parent mirror layer has root directory, establish upper, etc. Directory Parentroot: = path. Join (Parentdir, "root")//if root of parent is present, then if s, err: = OS. Lstat (Parentroot); Err = = Nil {if err: = Idtools. Mkdiras (path. Join (dir, "Upper"), S.mode (), Rootuid, Rootgid); Err! = Nil {return err} if err: = Idtools. Mkdiras (path. Join (dir, "work"), 0700, Rootuid, Rootgid); Err! = Nil {return err} if err: = Idtools. Mkdiras (path. Join (dir, "merged"), 0700, Rootuid, Rootgid); Err! = Nil {return err} if ERR: = Ioutil. WriteFile (path. Join (dir, "Lower-id"), []byte (parent), 0666); Err! = Nil {return err} return nil}//Otherwise, copy the upper and the Lower-id From the parent Lowerid, err: = Ioutil. ReadFile (path. Join (Parentdir, "Lower-id")) if err! = Nil {return err} if err: = Ioutil. WriteFile (path. Join (dir, "Lower-id"), Lowerid, 0666); Err! = Nil {return err} parentupperdir: = path. Join (Parentdir, "Upper") s, err: = OS. Lstat (PARENTUPPERDIR) if err! = Nil {return err} upperdir: = path. Join (dir, "upper") if err: = Idtools. Mkdiras (Upperdir, S.mode (), Rootuid, Rootgid); Err! = Nil {return err} if err: = Idtools. Mkdiras (path. Join (dir, "work"), 0700, Rootuid, Rootgid); Err! = Nil {return err} if err: = Idtools. Mkdiras (path. Join (dir, "merged"), 0700, Rootuid, Rootgid);
    Err! = Nil {return err}//This should be a copy of all the data from the parent mirror layer to the child mirroring layerReturn Copydir (Parentupperdir, Upperdir, 0)} 

Read the code to know that if there is a parent layer (which must also have the root directory of the parent directory, which seems to be the case), the Upper,work,merged,lower-id directory is established in this mirror layer directory and then returned. If there is no parent layer, the mirror layer itself is the underlying mirroring layer, and a root subfolder is created directly in the mirror layer directory and then returned.

Depending on the Go language feature, the second step is called:

    Apply the TAR package
    If Err = Ls.applytar (TX, TS, PID, layer); Err! = nil {
        return nil, err
    }

Called by the member of the Docker\daemon\graphdriver\overlay\overlay.go naivediffdriverwithapply Applydiff:

Applydiff creates a diff layer with either, the Naivediffdriver or with a fallback.
Func (d *naivediffdriverwithapply) Applydiff (ID, parent string, diff archive. Reader) (Int64, error) {

    B, err: = D.applydiff.applydiff (ID, parent, diff)
    If err = = Errapplydifffallback {
        // Initialize the naivediffdriverwithapply function (47 rows)
        //driver implemented in Docker\daemon\graphdriver\fsdiff.go
        return D. Driver.applydiff (ID, parent, diff)
    }
    return B, err
}

The

can see that Naivediffdriverwithapply.applydiff first tries to call D.applydiff.applydiff, and if it fails, it calls D.driver.applydiff. The
D.applydiff.applydiff is the member function of the Docker\daemon\graphdriver\overlay\overlay.go driver Applydiff:

Applydiff applies the new layer on top of the root, if parent does not exist with would return an errapplydifffallback E
Rror. Func (d *driver) Applydiff (ID string, parent string, diff archive. Reader) (size int64, err error) {dir: = D.dir (id) if parent = = "" {Logrus. DEBUGF ("Applied Tar on Err,no parent") return 0, Errapplydifffallback} logrus. DEBUGF ("Applied Tar on parent:%s", parent)//Only the parent mirror layer root exists to continue execution parentrootdir: = path. Join (D.dir (parent), "root") if _, Err: = OS. Stat (Parentrootdir); Err! = Nil {return 0, Errapplydifffallback}//We now know there are a parent, and it has a "root" direc Tory containing//the full root filesystem. We can just hardlink it and apply the//layer. This relies on the things://1) Applydiff is only run once on a clean (no writes to upper layer) container//2) Applydiff doesn ' t do no in-place writes to files (would break hardlinks)//These is all currently TRUE and is not expected to break//Mister into a temporary directory Tmproot Tmprootdir, err: = Ioutil. TempDir (dir, "Tmproot") if err! = Nil {return 0, err}//Last to delete temporary directories such as Upper defer func () {i F Err! = Nil {os. RemoveAll (Tmprootdir)} else {OS. RemoveAll (path. Join (dir, "Upper")) OS. RemoveAll (path. Join (dir, "work") OS. RemoveAll (path. Join (dir, "merged")) OS. RemoveAll (path. Join (dir, "Lower-id")}} ()//tmproot points to the root of the parent mirror layer//To hard link all the underlying content to the "down directory"//When the difference data is applied, the original inode is saved In the directory entry with the same name, point to the new inode if Err = Copydir (Parentrootdir, Tmprootdir, Copyhardlink); Err! = Nil {return 0, err} options: = &archive. Taroptions{uidmaps:d.uidmaps, Gidmaps:d.gidmaps}//Final call Applylayerhandler, implemented in Docker\docker\pkg\chrootarchive\diff _unix.go//Why to overwrite the parent layer if size, err = Graphdriver. Applyuncompressedlayer (Tmprootdir, diff, options); Err! = Nil {return 0, err}
        I don't know why I was named Root at the beginning, but later instead of root rootdir: = path. Join (dir, "root") if err: = OS. Rename (Tmprootdir, RootDir);
 Err! = Nil {return 0, err} return}

For a mirror layer that has a parent layer, a tmproot directory is built in the mirror layer directory, and then all the contents of the parent root directory are built into a hard link to the directory, the directory is deleted after the completion of the upper, and then changed to Tmproot root (say what the ghost, built the delete, built the change). The new data for this layer is then overwritten with the parent layer's hard links. Known by the hard-link features of Linux, for files of the same name, the file name (the directory entry object) points to the new file (the child layer file, the inode), and the other parent (parent-level inode). This completes the merge of the mirroring layer.

For a mirror layer without a parent layer, this is simpler, calling the above function to return an error, and then calling the member of Docker\daemon\graphdriver\fsdiff.go's Naivediffdriver Applydiff:

//Applydiff extracts the changeset from the given diff to the//layer with the specified ID a
nd parent, returning the size of the//new layer in bytes. Func (GDW *naivediffdriver) Applydiff (ID, parent string, diff archive. Reader) (size int64, err error) {driver: = GDW.
    Protodriver//Mount The root filesystem so we can apply the Diff/layer. is actually hanging in the path, if there is root, then directly back to root//through the ID to get the mirror layer root directory is the mirror layer directory root layerfs, err: = driver. Get (ID, "") if err! = nil {return} defer driver. Put (id) Options: = &archive. Taroptions{uidmaps:gdw.uidmaps, GIDMaps:gdw.gidMaps} start: = time. Now (). UTC () Logrus. Debug ("Applyuncompressedlayer to:%s", layerfs) if size, err = Applyuncompressedlayer (Layerfs, diff, options); Err! = nil {return} logrus. DEBUGF ("Untar Time:%vs", time. Now (). UTC (). Sub (Start). Seconds ()) return} 

Create a root folder directly in the mirror directory and unzip the tar package to that folder. You can see that Docker's overlay drive processing mirroring layer merging is a way of building the content of the underlying mirror layer into a hard-link sub-layer, and if the underlying mirror layer file is more, and the mirror has many layers, what will happen. Because the file system partition of the size of the metadata area is limited, each new layer is to establish a hard link to the underlying file, hard link is the directory item object, these directory item objects from the directory (special files, also known as the Inode) collection, is stored in the metadata area, so that the file system data area has not been used, Generate a lot of inode occupy the metadata area-this is the inode exhaustion problem

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.