Apache is going to run Hadoop in Docker

Source: Internet
Author: User
Keywords Run mirror
Tags access apache applications configuration environment file hadoop host

Apache publishes a page in its Hadoop Wikipedia that focuses on the benefits of running Hadoop in Docker and the need to run Hadoop entirely in Docker What needs to be done There are many advantages to running Hadoop YARN in Docker, or in other containers, as follows:

Software Dependencies and Configuration Isolation: Applications running in Docker that have software dependencies and configurations that are completely unrelated to the host have nothing to do with any other applications in Docker;

Security: Applications running in Docker have no other way to access the contents of the host file system (even if they are rooted in the Docker image) without proactive configuration, which protects the host file system, devices, etc. Wait;

Performance isolation: Docker can be applied to the resources required, such as CPU computing resources, memory resources, storage resources, bandwidth regulation;

Consistency: All tasks come from the same Docker image with a completely consistent software environment, independent of the host environment. For example, an Ubuntu image can take advantage of its features as if it were a real Ubuntu system, even if the host machine is RHEL;

Rapid deployment: Docker has a strong image storage and distribution capabilities, developers can easily get from the mirror center Hadoop YARN application image;

Programmable: Dockerfile, developers can easily YARN application of the file system, the environment configuration and running scripts set;

Although the advantages of containers are obvious, the current Docker and YARN scenarios do not support Hadoop YARN tasks running entirely in Docker. Apache is proposing the need to make changes to Docker and YARN and gives some of the current planned work:

YARN Docker actuators;

Docker needs to support user namespaces so that root users in Docker images can be mapped to regular users on the host computer to control user access to the host file system;

Container Network Configuration: This task is mainly for YARN master nodes to communicate with other nodes, Docker's existing NAT IP address does not allow running in a mirror task to access another physical host running on other tasks;

Dynamic Resource Limitations: Docker currently does not support dynamic configuration of image resources.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.