Enhance lightweight containers with SELinux and Smack

Source: Internet
Author: User
Tags posix terminates xen hypervisor

Http://www.bitscn.com/os/linux/200904/158771.html Secure Linux Container Implementation Guide Lightweight containers are also known as Virtual Private Servers (VPS) or jails, They are tools that are often used to restrict untrusted applications or users. However, the recently constructed lightweight containers do not provide adequate security assurances. Once these containers are enhanced with SELinux or Smack policies, you can implement a more secure container in Linux. This article describes how to create a more secure container that is protected by a Linux security module. The development of SELinux and Smack strategies is ongoing and has been continuously improved with the help of their communities. The first response when people hear a container is "how can I create a safe container?" ”。 This article addresses this issue by enhancing the security of the container using the Linux security Module (MODULES,LSM). This article demonstrates in particular how to set security objectives and achieve them through the Smack and SELinux security modules. To learn about the background of the Linux container, read the "linux.chinaitlab.com/administer/777960_2.html" Target=_blank>lxc:linux container Tool " (developerworks,2009 year February). Linux containers are conceptual artifacts built on several Linux technologies: Resource namespaces allow you to find processes, files, SYSV IPC resources, network interfaces, and so on inside a container. Control groups allows you to restrict the resources that are placed in the container. The feature binding (Capability bounding) sets restrictions on the access privileges of the container. The use of these technologies must be coordinated in order to achieve a container that conforms to the idea. There are currently two projects offering this feature: Libvirt is a large project that can create virtual machines using the Xen hypervisor, QEMU emulator, Kvmis, and even lightweight containers. LIBLXC is a small collection of library and user-space commands that are designed to help kernel developers quickly and easily test the functionality of a container. Because the "Lxc:linux container Tool" is written based on LIBLXC, I continue to use LIBLXC here, but the operations done here are also easy to do with Libvirt container support. Main element 1:lsm before you start, if you don't know much about LSM, you can now get a quick look. According to the Wikipedia definition: LinUX Security Modules (LSM) is a framework that allows the Linux kernel to support a variety of computer security models, while also avoiding the reliance on specific security implementations. This framework is licensed under the GNU general public License terms and is a standard part of the Linux kernel after Linux 2.6. The LSM was designed to provide all the necessary elements for a successful implementation of the mandatory access control module while minimizing changes to the Linux kernel. LSM avoids system call insertions in Systrace because it does not support multiprocessor cores and is susceptible to Tocttou (race) attacks. Conversely, when a user-level system is about to access important internal kernel objects (such as inode and task control blocks), the LSM inserts a "hook" in the kernel (calling the module up). This project is specifically designed to address access control issues to avoid a large number of complex changes to the mainstream kernel. The project is not intended to be a generic "hook" or "up-call" mechanism, nor does it support virtualization. The goal of LSM access control is closely related to solving the system audit problem, but it is different. Audit requirements record each access attempt. LSM does not solve this problem because it requires a large number of hooks to detect where the kernel "shorted" fault system is making the call and returns an error code when approaching an important object. System security consists of two conflicting targets. The first goal is to achieve complete fine-grained access control. You must control where information is likely to be leaked or damaged. There is no difference between too coarse-grained control and no control. For example, if all files must be classified as one type, and any one file is public, all files are public. On the other hand, the configuration must be simple, otherwise the administrator will need to manage a lot of access (but again, this is the same as no control). For example, if you need a large number of access rules to make your program work, the administrator adds a lot of access to the program, rather than testing whether the access rules are necessary. The two basic security modules in Linux use different methods to balance this contradiction. SELinux controls all things first, while simplifying policy management with powerful policy language. Smack mainly provides simple access control. Main elements 2:selinux so far, SELinux is the most famous MAC system for Linux (mandatory access control). Although there are still people against it, but popular Fedora? The release has been deployed with SELinux a few years ago, a testament to its success. SELinux uses a modular policy language configuration, so users can easily update an installed policy. This language also provides ainterfaces, allowing more advanced statements to be used to express a set of low-level statements. In this article, we will use a new interface to define the container. Although adding many access rights to a container makes the interface itself very large, it is simple to create a new container using an interface. This interface is very promising as part of the core publishing strategy. The main element 3:smack Smack is the abbreviated abbreviation for the simplified mandatory access control kernel (simplified Mandatory access controls Kernel). It first marks all processes, files, and network traffic with a simple text label. Create the latest file using the label of the create process. There are usually some default types with explicitly defined access rules. Processes can often read and write to objects that have the same label. The privileges that bypass Smack access rules are controlled by POSIX functionality, so tasks with Cap_mac_override can override rules, and tasks with cap_mac_admin can change rules and labels. The "POSIX file capabilities:parceling the power of Root" (reference) demonstrates these privileges. SELinux-protected containers We will use the SELinux policy on the container to include a policy module, which has been published to Refpolicy-SELinux Reference policy development mailing list. Download this policy separately to the Vs.if, VS.FC, and Vs.te files in the/root/vs directory. Compile and install the new module as follows: CP vm.img SELINUX.IMGCP vm.img smack.img then use Lxc-debian to create/VS1 AND/VS2 container, and use MKDIR/VS1; Cd/vs1lxc-debian CreateContainer Name:vs1hostname:vs1address:10.0.2.21gateway:10.0.2.2arch:2 (i386) MKDIR/VS2; Cd/vs2lxc-debian CreateContainer Name:vs2hostname:vs2address:10.0.2.22gateway:10.0.2.2arch:2 (i386) fixfiles Relabel/vs1fixfiles RELABEL/VS2 to re-mark their file systems. When you start the container (for example, by using the command Lxc-start-N VS1), you are likely to receive some audit messages about SELinux access rejections. But don't worry-the container will start normally, and the network service will be enabled and the container quarantined. If you use the Mount--bind//vs1/rootfs.vs1/mnt help container VS1 to disguise before starting the container, you will find that even the root user will reuse Ls/mnt/root. To understand how this works, let's look at the Vs.if interface file. This file defines an interface called container with a parameter (that is, the base name that the container will define). The Vs.te file uses the container name VS1 and VS2 two times to call this function. In this interface, $ is extended to this parameter, so when we call container (VS1), $1_t becomes vs1_t (from here, suppose we define VS1). The line that contains the vs1_exec_t content is the most important. This container runs in the vs1_t type. When unconfined_t executes the container's/sbin/init (type vs1_exec_t), it enters this type. The remaining strategy is to grant the container full privileges to access the various parts of the system: network ports, devices, and consoles. The interface is long, which is determined by the fine-grained nature of the existing SELinux reference policy. As we will see, the Smack-protected container has a simpler strategy, but it provides much less flexible protection when the system service behaves poorly. There is one more thing to do. It is important to note that although the container is not able to override its $1_exec_t (that is,/sbin/init), it is capable of performing mv/sbin/sbin.bakmkdir/sbintouch/sbin/init generated/sbin/init of type Vs1_f ile_t. Why does the container administrator need to do this? Because it launches the container in the unconfined_t domain, including SSH daemon, which allows him to acquire a privileged shell and be able to bypass the SELinux restrictions we are going to implement. To avoid this, you need to actually start the container with a custom script and mark the Sbin/init as vs1_exec_t before starting the container. In fact, if the container administrator does not mind, you can copy an init original copy back into the container and mark it again. But we only re-mark the existing Init:cat >>/vs1/vs1.sh << Eof#!/bin/shchcon-T vs1_exec_t/vs1/rootfs.vs1/sbin/initlxc-start-n vs1eofchmod u+x/vs1/vs1.sh now need to use/vs1/vs1.sh to start the container instead of using Lxc-start manual Start. Smack-protected containers recompile the kernel when Smack is enabled. You must be able to enter make Menuconfig in the/root/rpmbuild/build/kernel*/linux* directory and then go to the Security menu to disable SELinux and enable Smack. Next, just repeat steps make && make Modules_install && make install. In addition, you should stop the configuration of SELinux for user space. This can be implemented on the SELinux management GUI, or edit the/etc/selinux/config and set the selinux=disabled. A few more steps are required to install the Smack policy at boot time: Mkdir/smackcd/usr/srcwget Http://schaufler-ca.com/data/080616/smack-util-0.1.tartar XF Smack-util-0.1.tar; CD Smack-util-0.1make && CP smackload/bin The actual smack strategy is similar to Listing 1: Listing 1. Smackaccesses VS1 _ Rwa_ vs1 rwavs2 _ rwa_ VS2 Rwa_ Host Rwaxhost _ Rwax it should be copied to a file named/etc/smackaccesses. The next time you run/bin/container_setup.sh, this file will be loaded into smackload. This strategy is very simple. By default, any label can read data marked with _. We define a new tag host for the private data of the host that the container cannot access, and apply this tag to the Cgroups file system in the container_setup.sh script. Other sensitive files, such as/etc/shadow, should use this tag. We define VS1 and VS2 to mark the container. By default, they have access to their own data. We add a rule that allows them to write _, thusAllow network packets to be sent. Because VS1 cannot access VS2 data (and vice versa), the containers are independent of each other. As mentioned earlier, the ability to define or circumvent Smack rules is determined by the cap_mac_admin and Cap_mac_override functions. Therefore, containers should not have these capabilities. This can be done through the helper program dropmacadmin.c (see download section). It must be compiled statically because the container from the host has a different version: Gcc-o dropmacadmin DROPMACADMIN.C-STATICCP dropmacadmin/bin/creates a new container called VS1: MKDIR/VS1; Cd/vs1lxc-debian CreateContainer Name:vs1hostname:vs1address:10.0.2.21router:10.0.2.2arch:2 (i386) Use label VS1 Mark VS1 text All files in the system: for F in ' FIND/VS1/ROOTFS.VS1 '; Doattr-s-S Smack64-v VS1 $fdone Now you need to create a script that can safely boot the container. This means that it can set its own process label to VS1 and package the container's/sbin/init through Dropmacadmin. as follows: Cat >>/vs1/vs1.sh << eof#!/bin/shcp/bin/dropmacadmin/vs1/rootfs.vs1/bin/attr-s-S smack64-v vs1/vs1 /rootfs.vs1/bin/dropmacadminecho VS1 >/proc/self/attr/currentlxc-start-n vs1/bin/dropmacadmin/sbin/ Initeofchmod u+x/vs1/vs1.sh do one more thing and let VS1 write on the Tmpfs file system that it's about to load: Sed-i ' S/DEFAULTS/DEFAULTS,SMACKFSROOT=VS1, smackfsdef=vs1/'/vs1/rootfs.vs1/etc/fstab This led to the loading of the Tmpfs file system on/DEV/SHM to bring the VS1 tag, allowing VS1 toIt performs a write operation. Otherwise, the VS1 init script will not be able to create the/dev/shm/network directory that needs to be used when setting up the network. Similarly, if you want to use RAM-based/TMP, use the same options. Now, we help VS1 to disguise again. Create the VS2 as you created VS1, and replace VS1 with VS2 in each step. Then bind the Mount root file system under VS1/mnt: Mount--bind/vs1/vs1mount--make-runbindable/vs1mount--rbind//vs1/rootfs.vs1/mnt use VS1.S H start the container. Note that you can also see the Web pages on VS1 and VS2 from the KVM host. Also note that VS1 cannot access VS2 over the network. It also cannot view VS2 files: vs1:~# ls/mnt/(directory listing) vs1:~# LS/MNT/VS2/ROOTFS.VS2 Ls:/mnt/vs2/rootfs.vs2:permission denied vs1:~# mkdir/cgroupvs1:~# mount-t cgroup cgroup/cgroupvs1:~# ls/cgroup ls:/mnt/vs3:permission deniedvs1:~# Mknod/dev /SDA1 B 8 1 mknod: '/dev/sda1 ': Operation not permittedvs1:~# mount/mnt/dev/sda1/tmp Mount:permission denied it can view the host file system 。 For anything that needs to be protected, you can tag it with the host tag. This has been done on the Cgroup file system, which is why Ls/cgroup failed. Finally, the device whitelist Cgroup prevents us from creating a disk device, or loading it if it exists (because it needs to be done by/mnt). Of course, our setup allows the container administrator to remove/mnt/dev/sda1, or otherwise disrupt the host, so this binding load is not as good as the demo! Note that on SELinux systems, the default (and easy) routing allows containers to talk to each other over the network, whereas in Smack it is the opposite. Currently, it is difficult to allow containers to talk to each other. In the near future, you will be able toand allows policies to be established for communication between containers. There is another problem with how to establish a Smack network. Command Kill-9-1 terminates each task on the system. When this operation is performed by a task in the container, it terminates only the tasks in the same container. This behavior has been fixed in the upstream kernel, but the Fedora 10 kernel We use still has this problem. As a result, each task emits a 9 signal. In a selinux-protected container, SELinux prevents the signal from passing through the container boundary, so kill-9-1 is actually safe. In Smack, however, the task is labeled _ (like a network) by default, so because we allow the container to write to the network, and the Terminate task is considered write access in Smack, the container administrator is allowed to terminate any task on the entire system. Another drawback (the SELinux container still has this disadvantage) is related to the Unix98 pseudo terminal. Open two graphical terminals. In the first terminal, start VS1 and view/dev/pts. You will see at least two entries (0 and 1), each of which belongs to each terminal. You can write to an entry that corresponds to another terminal from the VS1 container. There are two solutions for the Fedora kernel. You can use the device whitelist Cgroup to reject the container to open the device. However, this must be done manually at each boot of the container to allow it to access its own terminal, or the SELinux and Smack tags will be used as a result. The updated 2.6.29 kernel supports the Devpts namespace. The container must reload/dev/pts, after which it will not be able to access the Devpts entries belonging to the host or other containers. Concluding remarks This article describes the tools required to build a LSM-protected container, but much more needs to be done: for Smack, you must select the file you want to mark as host. For SELinux, it should be tuned and then put a container interface into the upstream referral policy. While these efforts are under way, you should not rely entirely on these mechanisms to block untrusted root users until more experience with LSM-protected containers is available. Although there are currently no best practices for creating containers, there are still some ideas that are worth considering. First, remember that you are trying to achieve two conflicting goals: minimizing replication between containers (and hosts), while ensuring security isolation. One way to achieve these goals is to create a minimal complete rootfs, where no container is run, and its type is marked as a type that all containers can read. Then use the Lxc-sshd script to set theSedan this creates each prototype-based actual container, creating a read-only load for most of the container's file systems, while providing a private writable location (such as/scratch) where the file can be stored by the container. Because each container has a private mount namespace, it can bind to mount any private and/or writable files or directories for its private shared directory. For example, if it requires a private/lib, you can perform Mount--bind/scratch/rootfs/lib/lib. Similarly, administrators can ensure that each container executes mount--bind/scratch/shadow/etc/shadow at startup. For SELinux and Smack, one of the obvious drawbacks of this approach I demonstrated is that container administrators cannot use LSM to control the flow of information within their own containers. And for simplicity, all tasks in the container are handled uniformly using MAC policy. In another article, I'll explore how to allow container administrators to specify their own LSM policies while being able to constrain them.

Use SELinux and Smack to enhance lightweight containers

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.