Editor's note: The last period of time reproduced in the "5 minutes to understand docker! "Very popular, a short 1500 words, let everyone quickly understand the Docker." Today, I saw the author make a new novel, and immediately turned over. The reason to call this code reading as a fantasy trip is because the author Liu Mengxin (@oilbeater) in the process of reading Docker source, found a few interesting things: from the code point of view Docker did not start a new development mechanism, but the existing tested isolation security mechanism to use the full use, This is also mentioned in the Cgroups,capability,namespaces,apparmor and SELinux (csdn the article "Security of Container vs virtualization", which is just translated); From the point of view of the code, the quality of Docker source code is hard to compliment. , some code styles and logic are hard to read.
Author Introduction: Liu Mengxin, a development and testing of the operation of the personnel, in Alibaba digging treasure, the way to light the DBA skill tree. Focus on Docker, virtualization and cloud computing technology.
The following is the original text:
Have been curious about the container provided by Docker, do not know how to achieve isolation and security, before the Docker was used to provide the container function LXC, but because of the kernel code is a bit of fear did not dare to see, Later heard that Docker in order to achieve cross-platform compatibility of their own implementation of a set of native container is Libcontainer. Since it is a new project, then the amount of code and complexity should not be too high bar, with this idea I look at the Libcontainer code read.
Preparation Work
First of all, it is necessary to go down to the code to read, suggest the full Docker source, do not only under the Libcontainer source. Otherwise it would be like me to read the time to meet a pit, fell inside and climbed for half a day.
The next step is to have a code reader, because the go language is a relatively new language, the matching tools are not perfect, but you can use Liteide (own ladder) This lightweight Golang IDE to part-time.
After opening, you can see that the Docker directory structure is roughly the same:
So where are the libcontainer we are concerned about? It's pretty deep in \verdor\src\github.com\libcontainer\. When you get inside, you'll find a conspicuous container.go waving at you, well, the first pit is coming soon.
Container
The code looks pretty straightforward at first. The code shrinks as follows
type Container Interface {ID () string runstate () (*runstate, Error) Config () *config Start (Config *processconfig) (PID int, Exitchan Chan int, err Error) Destroy () Error processes () ([]int, error) Stats () (*containerstats, error) Pause () error Resume () Error}
You can see that this code just defines an interface, and any object that implements these methods becomes a Docker-approved container. One of the more critical functions is start, which is the way to start a process in container, and you can see that the interface is called into a configuration that is related to starting a process, returns a process PID, and a channel that accepts the exit information.
The next step is to find the implementation of the interface, see how it is done, and then a pit comes. Because the go language does not require the object to the Java-like explicit declaration of which interface they implement, as long as they silently implement the corresponding method, the default becomes the object of that interface type. So there's no intuitive way to find out which objects implement this interface, and flipping through the files in the Libcontainer folder doesn't feel like either. Feel some ominous omen, installed a Cygwin to grep start this function, the result of unexpected discovery did not, so again in the entire Docker directory of grep found or not.
I am strange, not to say Docker 1.2 after support native container, he even Libcontainer container interface are not implemented, is how to call native container. Since the bottom of the upward can not find, it is only from the top down from the upper level down to look for what's going on.
Driver
Docker supports LXC and native two sets of container implementations, which are accomplished by driver two implementations of this interface. In \daemon\execdriver you can see the LXC and native two folders, which is the relevant code. But in the \daemon\ directory can see there is a container.go inside there is a container object, but did not realize the corresponding interface Libcontainer, is libcontainer that interface is just a pretence?
Let's take a look at the driver interface.
Type Driver interface {run (c *command, pipes *pipes, Startcallback startcallback) (int, error)//run executes the process and Blocks loop the process exits and returns the exit code//EXEC executes the process in a running container, blocks loop the Proce SS exits and returns the exit Code Exec (c *command, Processconfig *processconfig, pipes *pipes, Startcallback startcallback) (int, error) Kill (c *command, sig int) error Pause (c *command) error unpause (c *command) error name () string//Driver Name Info (ID string) Info//"temporary" hack (loop we move state from core to plugins) Getpidsforcontainer (ID string) ([]int, error)//Returns A List of PIDs for the given container. Terminate (c *command) error/Kill it with fire clean (ID string) error//clean all traces of container exec}
There is no feeling of the name, although it is not the same as the container interface above, but the meaning is similar. Resume became unpause, Destory became teminate,processes and Getpidsforcontainer,start became run and exec two functions. See this has to say Docker code consistency and readability or worse, codereview need to be more stringent.
Then enter the native Driver.go can see the concrete realization. A long list of import was found on the head of the file, several of which were more eye-catching:
Import (...) "Github.com/docker/libcontainer" "Github.com/docker/libcontainer/apparmor" "github.com/docker/libcontainer/ Cgroups/fs "" Github.com/docker/libcontainer/cgroups/systemd "consolepkg" Github.com/docker/libcontainer/console "" Github.com/docker/libcontainer/namespaces "_" Github.com/docker/libcontainer/namespaces/nsenter "" github.com/ Docker/libcontainer/system ")
There seems to be a clue. The purpose of Libcontainer is to provide a platform-independent native container, which requires a series of common components such as resource isolation, permission control, and so libcontainer to provide these generic components, so he is called "Lib". Each platform to implement its own container can be used to borrow these components, of course, can only use a part of the use, Docker is equivalent to the use of including as, cgroups, namespaces and other components, and then useless libcontainer The container interface and some other components themselves write the other part of the finished so-called native container.
Or look at the run function.
func (d *driver) Run (c *execdriver.command, pipes *execdriver. Pipes, Startcallback Execdriver. Startcallback) (int, error)
Which Execdriver. Pipes is a structure that defines standard input and output and error pointing, Startcallback is a callback function that is invoked at the end of a process or when it exits, and the most important structure is Execdriver.command he defines the various environments and constraints that run the program within the container. The corresponding definition can be found in the driver.go under Daemon.
Command
Type Command struct {id string ' JSON: ' ID ' rootfs string ' json: ' Rootfs '//root fs of the container Initpath string ' JSON: ' Initpath "'//Dockerinit Workingdir string ' JSON: ' Working_dir ' Configpath string ' json: ' Config_path '//This should to be Inc. to being removed when the LXC template are moved into the driver receptacle *network ' json: "Receptacle" ' Resources *resources ' JSON: " Resources "' Mounts []mount ' JSON: ' Mounts ' alloweddevices []*devices. Device ' JSON: Allowed_devices ' autocreateddevices []*devices. Device ' JSON: Autocreated_devices ' capadd []string ' JSON: ' Cap_add ' ' capdrop[]string ' json: ' Cap_drop ' containerpid ' int ' JSON: ' container_pid '//The PID for the process inside a container processconfig processconfig ' JSON: ' Process_config ' '//Describes the init process of the container. Processlabel string ' JSON: ' Process_label ' Mountlabel string ' json: ' Mount_label ' lxcconfig []string ' JSON: ' Lxc_config ' Apparmorprofile string ' json: ' Apparmor_profile '}
Resources associated with process isolation provide CPU and memory resource allocations that can be invoked by cgroups in the future. Capadd and Capdrop, which are related to the Linux Capability, control that some of the system invoke permissions of root are not used by programs in the container. Processlabel a readable for the process inside the container so SELinux will be able to do this readable in the future. Apparmorprofile points to the Docker default as profile path, typically/etc/apparmor.d/docker, to control the access to the file system.
As you can see, Docker's isolation strategy for the container is not to develop an isolation mechanism of its own, but to use existing isolated mechanisms that are available. Even as and SELinux these two are similar and the two are still competing with each other mechanism is also a brain regardless of 3,721 plus, quite doctrine style. In this case, if the malicious program broke through a layer of protection and another layer of blocking, and these isolation mechanisms also protect each other to break through all the protection.
And the program we really want to execute in the container is the entrypoint in the processconfig structure. This shows that the so-called container is a wearing a variety of isolation coat procedures, with these isolated coats to protect this program can live in their own niche, do not know Han regardless of Wei and Jin.
Exec
Or back to run inside to see exactly how to run it, after reading a series of initialization and abnormal judgment finally to the real running code, only one line, looks like this:
return namespaces. Exec (container, C.processconfig.stdin, C.processconfig.stdout, C.processconfig.stderr, C.processconfig.console, DataPath, args, func (container *libcontainer. Config, console, DataPath, init string, child *os. File, args []string) *exec. CMD {C.processconfig.path = D.initpath C.processconfig.args = append ([]string{drivername, "-console", console, "-pipe", "3", "-root", filepath. Join (D.root, C.id), "--",}, args ...) Set this to nil so then we set the clone flags anything else is reset c.processconfig.sysprocattr = &syscall. sysprocattr{Cloneflags:uintptr (namespaces. Getnamespaceflags (container. namespaces)),} c.processconfig.extrafiles = []*os. File{child} c.processconfig.env = container. ENV C.processconfig.dir = container. Rootfs return &c.processconfig.cmd}, func () {if startcallback!= nil {c.containerpid = C.processconfig.process.pid Startcallback (&c.processconfig, C.containerpid)}})
See the whole people here is bad, I think Docker this project if this will go wrong, even if you like anonymous function also don't be so paranoid good. I even suspected that Docker was using black technology to hide his real code. So I decided to give up this line of code directly to see namespaces. Exec went. In the \verdor\src\github.com\libcontainer\namespaces\exec.go.
func Exec (Container *libcontainer. Config, stdin io. Reader, stdout, stderr io. Writer, console, datapath string, args []string, CreateCommand CreateCommand, Startcallback func ()) (int, error)
Not very sure of a function 8 parameters really good, but I was more puzzled in the main project since there are pipe this structure stdin,stdout,stderr put together why to here will be written separately, 6 although also a lot, but better than 8. Back to the namespace, this is another isolation mechanism. As the name suggests, the isolation is the namespace, which would have belonged to the globally visible name resources, such as pid,network,mountpoint, such as the virtual number of resources, each namespace a, each group of processes occupy a namespace. In this case, the procedures in the container can not see the external other processes, the difficulty of the attack will naturally increase.
Then the most critical execution of the sentence is very simple.
If Err: = command. Start (); Err!= Nil {child. Close () return-1, err}
The command is the system call class exec. Cmd object, and the previous configuration information about the program has been integrated into the command in that line of execution code, where the start program runs. Then I wondered, this function is not namespaces package, I do not have namespaces set the relevant code. In fact, you look at that line of code can be found in the implementation of the namespaces is also in the inside, in other words, this namespaces package exec did not do anything and namespaces related things, just start a bit. This code logic structure is to read the code of the people brought a little confusion ah.
Summary
The starting point of this reading code is to understand how the container is isolated and secure. From the code's point of view, Docker does not start a new development mechanism, but rather the full use of existing tested isolation security mechanisms, including Cgroups,capability,namespaces,apparmor and SELinux. The effect of such a combination of punches in theory is good, even if one of the mechanisms out of the hole, but the method to exploit this vulnerability is likely to be restricted by other mechanisms, to find a way to bypass all the isolation mechanism is more difficult.
But from the point of view of reading code, the quality of Docker code is hard to compliment, even if Libcontainer is a separate part, but the name of the same root is inconsistent, do not know later will be more confusing. And some of the code style and logic is really hard to read, the code quality to improve the place there are many. After all, it is open source projects, even if the function is very powerful, but if you find that the code quality problems, I am afraid it is not very dare to use in production.
And as for Libcontainer despite the independent development from the Docker, but can be seen and the main project there are some not cut clean, and Docker main project currently does not adopt Libcontainer container way, Just in the call to some of the mechanism of the method, it seems that the current is still in a gradual replacement process. Libcontainer and an independent and complete product is still a distance, you can also participate in the interest, in case this is the next great project?
Original link: A fantastic docker Libcontainer Code reading Tour (Zebian: Zhou Xiaolu)
If you need more information about Docker or technical documentation to access the Docker technology community, if you have more questions, please put it in the Dcoker Technical Forum and we will invite experts to answer. CSDN Docker Technology Exchange QQ Group: 303806405.
Container Technical daily public account has been opened, welcome attention!