Today, when containers and virtualization are widely popular, how to construct a safe and clean container is a problem that everyone cares about. The principle of minimization of "only install necessary applications" is the same as the requirements of the security field for the system is also the basic rule of container construction. On the one hand, the minimized application can reduce the size of the image and save the time of uploading and downloading. At the same time, reducing the application in the container reduces the intrusion point and makes the container more secure. In this article, we introduce a set of good practices for containers that summarize industry practices. The goal is to make container construction faster, safer, and more flexible. This article assumes that the reader has a certain understanding of Docker and
Kubernetes, but it can also be used as a
Docker container construction code alone.
One application per container
When starting to use containers, a common misunderstanding is to use the container as a virtual machine. Doing so often makes them unable to easily meet certain needs and is very painful, and it also deviates from the biggest advantage of the container. Many beginners asked a lot in the group why: Why can't Docker aaa? How to implement Docker bbb? Then the answer he needs is actually a dummy. Although modern containers can already meet these needs, this will greatly reduce most of the advantages of the container model. Take the classic Apache/MySQL/PHP stack as an example. You might want to run all components in one container. However, the best practice is to use two or three different containers: Apache container, MySQL container, and php container running PHP-FPM.
Since the container design philosophy is that the container and the hosted application have the same life cycle, each container should only contain one application. When the container starts, the application starts with it, and when the container stops, the application also stops.
Signal processing, PID 1 and zombie processes
Linux signals are the main method to control the life cycle of processes in containers. In order to be consistent with the previous best practice, in order to associate the life cycle of the application with the container, make sure that the application can correctly handle Linux signals. One of the most important Linux signals is SIGTERM, because it is used to terminate the process. The application may also receive the SIGKILL signal to terminate the process abnormally, or the SIGINT signal to accept the typed Ctrl + C command.
The process identifier (PID) is a unique identifier provided by the Linux kernel for each process. PID has a name space, and the container has its own set of PIDs, which will be mapped to the PID of the host system. The Linux kernel will create the first process with PID1 when it starts. Used for init system to manage other processes, such as systemd or SysV. Similarly, the first process started in the container is also PID1. Docker and Kubernetes use signals to communicate with the processes in the container. Both Docker and Kubernetes can only send signals to processes with PID 1 in the container.
In a container environment, two PIDs and Linux signals need to be considered.
How does the Linux kernel handle signals?
The Linux kernel handles PID 1 processes differently from other processes. PID1 does not automatically register the semaphore SIGTERM, so SIGTERM or SIGINT is invalid for PID 1 by default. By default, the SIGKILL signal must be used to kill the process. The process cannot be shut down gracefully, which may cause errors, interruption of monitoring data writing (for data storage) and some unnecessary alarms.
How does a typical initialization system handle orphaned processes?
Typical initialization systems (such as systemd) are also used to delete (capture) orphaned zombie processes. The zombie process (the process whose parent process has died) will be attached to the process with PID 1, and be captured and closed by it. But in the container, it needs to be processed by a process mapped to the container PID 1. If the process cannot be handled correctly, there may be a risk of insufficient memory or other resources.
There are several common solutions to these problems:
1. Run and register the signal handler with PID 1
2. Enable process namespace sharing in Kubernetes
When you enable process namespace sharing for a Pod, Kubernetes uses a single process namespace for all containers in the Pod. The Kubernetes Pod base container becomes PID 1, and the isolated process is automatically captured.
3. Use a special initialization system
Just like in the more classic Linux environment, you can also use the init system to solve these problems. However, if used for this purpose, ordinary initialization systems (such as systemd or SysV) are too complicated and too heavy, and it is recommended to use a special container-created initialization system (such as tini).
If a container-specific initialization system is used, the initialization process has PID 1 and performs the following operations:
• Register the correct signal handler.
• Make sure the signal is valid for your application.
• Capture all zombie processes.
This solution can be used in Docker by using the --init option of the docker run command. To use it in Kubernetes, you must first install the init system in the container image and use it as the entrance to the container.
Optimize Docker build cache
Remove unnecessary tools
In order to protect your application from attackers, please try to reduce the attack surface of your application by removing all unnecessary tools. For example, delete utilities such as netcat, because you can use necat to build a reverse shell at will. If netcat is not installed in the container, the attacker cannot simply exploit it.
File system content
Keep as little content as possible in the mirror. If the application can be compiled into a single statically linked binary file, adding the binary file to the temporary image will result in a final image, which contains only one application and nothing else. By reducing the number of tools packaged in the image, you can reduce the potential operations that can be performed in the container.
File system security
No tool in the mirror is not enough. It is necessary to prevent potential attackers from installing tools. Two methods can be combined here:
Minimize the mirror
Generating smaller images has advantages such as faster upload and download time, which is particularly important for the cold start time of pods in Kubernetes: the smaller the image, the faster the node downloads. However, it is difficult to build a small image because it may unintentionally introduce build dependencies or unoptimized image layers to the final image.
Use the smallest base image
The base image is the image referenced in the FROM instruction in the Dockerfile. All instructions in the Dockerfile are based on this image. The smaller the base image, the smaller the generated image and the faster the download and load. For example, the alpine:3.7 mirror is several dozen M smaller than the centos:7 mirror.
Reduce invalid deletion of mirroring
To reduce the size of the image, you need to strictly follow the principle of installing only necessary applications. It may sometimes be necessary to temporarily install the software packages of some tools, and then delete them in a later step after use. However, this method is also problematic. Because each instruction of the Dockerfile creates a mirroring layer, the method of deleting it in a later step after creation cannot actually reduce the size of the mirror. (The data is still there, it's just hidden in the bottom layer).
Try to create a mirror with a common mirror layer
If you must download a Docker image, Docker first checks whether some layers are already included in the image. If you have these mirror layers, they will not be downloaded. If the other mirrors downloaded previously have the same basic mirror as the currently downloaded mirror, the download data volume of the current mirror will be much less.
Vulnerability scanning for container registry
For servers and virtual machines, software vulnerability scanning is a commonly used security method. Through a centralized software scanning system, the software packages installed on each host and the existing vulnerability sources are listed, and administrators are notified to patch vulnerabilities, such as bugs. Flan Scan system introduced in the previous article.
Since containers are immutable in principle, it is not recommended to patch them when there are vulnerabilities. The best practice is to re-image it, package the patch, and then redeploy it. Compared with the server, the life cycle of the container is much shorter, and the definition of the identity is much better. Therefore, it is a bad way to use similar centralized detection of vulnerabilities in containers.
Mark the mirror correctly
Docker images are usually identified by two parts: their name and label. For example, for the centos:8.0.1 image, centos is the name and 8.0.1 is the label. If the latest label is not provided in the Docker command, the latest label will be used by default. The name and label pair should be unique at any given time. However, labels can be reassigned to other mirrors as needed. When building an image, it needs to be marked correctly and follow a unified and consistent marking strategy.
Container mirroring is a method of packaging and distributing software. Marking the mirror allows users to identify a specific version of the software for download. Therefore, the marking system on the container image is related to the software release strategy.
Use semantic version markup
The common way to distribute software is to use The Semantic Versioning Specification (The Semantic Versioning Specification) version number to "tag" (such as in the git tag command) a specific version of the source code. Semantic version number specification is to improve the confusion of various software version number formats and the current situation of unclear semantics. A method to deal with version numbers is proposed by semver.org. In this specification, the software version number consists of three parts: X.Y.Z,
Commit hash tag with Git
If you use a continuous delivery system and release software frequently, you may not be able to use the version numbers described in the semantic version control specification. In this case, the usual way to deal with version numbers is to use the Git commit SHA-1 hash (or its short version) as the version number. According to design principles, Git's commit hash is immutable and refers to a specific version of the software.
Weigh the use of public mirrors
One of the great advantages of Docker is the large number of publicly available images that can be used for various software. These images allow you to get started quickly. However, when designing a container strategy for an online environment, you may encounter some restrictions that make publicly available images unable to meet the requirements.
to sum up
This article introduces some basic principles that should be followed in the container construction process. Through these principles, you can ensure that the constructed container is safe, refined, shrinkable, and controllable. Of course, these terms are only recommendations. Try to follow. Some of the methods involved are for reference only, and you can also use solutions that are more suitable for you while observing the basic principles.