Apache publishes a page in its Hadoop Wikipedia that focuses on the benefits of running Hadoop in Docker and the need to run Hadoop entirely in Docker What needs to be done There are many advantages to running Hadoop YARN in Docker, or in other containers, as follows:
Software Dependencies and Configuration Isolation: Applications running in Docker that have software dependencies and configurations that are completely unrelated to the host have nothing to do with any other applications in Docker;
Security: Applications running in Docker have no other way to access the contents of the host file system (even if they are rooted in the Docker image) without proactive configuration, which protects the host file system, devices, etc. Wait;
Performance isolation: Docker can be applied to the resources required, such as CPU computing resources, memory resources, storage resources, bandwidth regulation;
Consistency: All tasks come from the same Docker image with a completely consistent software environment, independent of the host environment. For example, an Ubuntu image can take advantage of its features as if it were a real Ubuntu system, even if the host machine is RHEL;
Rapid deployment: Docker has a strong image storage and distribution capabilities, developers can easily get from the mirror center Hadoop YARN application image;
Programmable: Dockerfile, developers can easily YARN application of the file system, the environment configuration and running scripts set;
Although the advantages of containers are obvious, the current Docker and YARN scenarios do not support Hadoop YARN tasks running entirely in Docker. Apache is proposing the need to make changes to Docker and YARN and gives some of the current planned work:
YARN Docker actuators;
Docker needs to support user namespaces so that root users in Docker images can be mapped to regular users on the host computer to control user access to the host file system;
Container Network Configuration: This task is mainly for YARN master nodes to communicate with other nodes, Docker's existing NAT IP address does not allow running in a mirror task to access another physical host running on other tasks;
Dynamic Resource Limitations: Docker currently does not support dynamic configuration of image resources.