Constructing Hadoop fully distributed cluster __linux based on virtual Linux+docker

Source: Internet
Author: User
Tags get ip ssh server docker ps docker run

This article assumes the user basic understanding Docker, grasps the Linux basic Use command, understands Hadoop's general installation and the simple configuration

Experimental environment: Windows10+vmware WorkStation 11+linux.14.04 server+docker 1.7

Windows 10 as a solid machine operating system, the network segment is: 10.41.0.0/24, virtual machine using NAT network, subnet for 192.168.92.0/24, gateway for 192.168.92.2,linux 14.04 as a virtual system, as a container host, IP is 192.168.92.129. This article will be based on the above environment in the Linux system to build a fully distributed Hadoop cluster, the node composed of Master+slave1+slave2

First, virtual system installation

Install VMware Workstation on WINDOWS10 and create a Linux virtual machine that allocates disk space, CPU, and memory based on the performance of the machine, and the network type is NAT (depending on the actual network environment, the tutorial is rich and not repeat) Choose the installation type for the SSH server when installing Linux.

After installing the Linux virtual system, you can see the virtual machine assigned subnets as 192.168.92.0/24@ in the VMware Workstation Network Editor 192.168.92.2, in the virtual machine terminal execution Ifconfig can see Linux automatically get IP 192.168.92.129 edit/etc/network/interfaces file, configure static IP address:

Execute the command to make the network configuration effective:

sudo/etc/init.d/networking Restart #有时候该命令没什么效果, restart it.

Use Linux remote Administration Tools (such as Xshell, putty) to log on to the Linux system


Second, install Docker

Refer to the link to install docker:http://dockerpool.com/static/books/docker_practice/install/ubuntu.html


Third, get the mirror

Download ubuntu14.04 mirrors from the Docker warehouse (files are small, less than 200MB)

sudo docker pull ubuntu:14.04 #下载镜像
sudo docker images #查看本地仓库中的镜像


The bottom 14.04 is the mirrored image that is downloaded, and the others are new mirrors that are based on this mirror


IV. Custom containers

Execute the following command to create and start a container:

sudo docker run-ti ubuntu:14.04

This will switch to the container's temporary terminal and go directly to the root environment by default, and then do the following:

1, modify the installation source, because the Docker pull is the image of the use of foreign installation sources, in the use of the installation of Apt-get install operation when the download phase will be particularly slow or even download unsuccessful, the proposed replacement into domestic sources, such as Cn99, NetEase, can also refer to the host's source configuration (it is to visit the Ubuntu deployed in the domestic server), to find a replacement/etc/apt/sources.list file.

2, install SSH

sudo apt-get update
sudo apt-get install Openssh-server

3, the configuration ssh to avoid the secret login

Create an. ssh folder under the user directory to execute the command:

Ssh-keygen-t Rsa-p '-F ~/.ssh/id_rsa
CP ~/.ssh/id_rsa.pub ~/.ssh/authorized_keys

Then modify the SSH configuration file (/etc/ssh/sshd_config), change the Permitrootlogin Without-password to Permitrootlogin Yes, and then modify the root password.

If you do not want to allow root to log in directly, you need to create a user yourself, all of which are performed under that user's environment.
4. Download and install JDK

sudo apt-get install Oracle-java8-installer #也可使用wget直接从oracle官网下载

After downloading, configure the JDK environment variables and execute Java, javac commands for testing.

5. Configure hosts

As described in the experimental environment above, the Hadoop cluster consists of a master node and two slave nodes, which require the addition of IP to host name mappings in the Hosts file. Perform the ifconfig command to see that the IP of the ETH0 network card docker to the container is 172.17.0.x segment (the private IP segment may be different for different environments). And the IP changes when the container is restarted, and the Hadoop cluster is best configured with a static address, and in the next step, a tool is used to give the container a virtual A new network adapter, assigning a fixed address, which is the same network segment as the Docker service created in the host DOCKER0 the Network Bridge.

Re-open a remote admin Terminal program to log on to the Linux host. The implementation of the Ifconfig command to the DOCKER0 Network Bridge IP (the experimental environment of this article is 172.17.42.1), and each reboot the system, the IP is not changed, so you can assign IP to the container in the network segment, do the following distribution:

172.17.42.2 Master
172.17.42.3 slave1
172.17.42.4 slave2

It is to be noted that after the container starts, it initializes the hosts file and adds a mapping of the ETH0 network card address to the hostname, which leads to a eth0 of the NIC after the Hadoop cluster starts, so the file needs to be rebuilt, and a simple script is provided here:

#!/bin/bash
echo "#ip and hostname Information" >/etc/hosts
echo "127.0.0.1 localhost" >>/etc/hosts< C4/>echo "172.17.42.2 master" >>/etc/hosts
echo "172.17.42.3 slave1" >>/etc/hosts
echo " 172.17.42.4 slave2 ">>/etc/hosts

Add the script to boot up.

6, download configuration Hadoop

Use wget to download hadoop2.6 (choose the version you want to install according to your situation):

wget http://mirrors.sonic.net/apache/hadoop/common/hadoop-2.6.0/hadoop-2.6.0.tar.gz

Unzip the compressed package to a directory, edit the configuration file in the Hadoop-2.6.0/etc/hadoop directory, and modify the configuration accordingly.


Five, Save the Mirror

Execute the exit command to exit the container terminal, and then execute the command to save changes to the container to a new mirror:

sudo docker commit-m "description information about the images" < container id>

Vi. creating and starting a new container

Execute command to create and start a new Hadoop cluster container:

sudo docker run-d-H master--name=hadoop_master ubuntu:hadoop/usr/sbin/sshd-d #启动hadoop主节点
sudo docker run-d-H S Lave1--name=hadoop_slave1 ubuntu:hadoop/usr/sbin/sshd-d #启动hadoop从节点
sudo docker run-d-H slave2--NAME=HADOOP_SL Ave2 ubuntu:hadoop/usr/sbin/sshd-d #启动hadoop从节点
sudo docker ps-a #查看所有创建的容器



Seven, install Pipework tool and create virtual network card 1, download a tool named Pipework from Gitub, address is: https://github.com/jpetazzo/pipework

The tool is mainly to the container virtual a new network card, configure static address to the network card, while the new network card and Docker in the host to create bridges bridged bridge, so that the container and host can be exchanged through the network card.

Download and unzip the tool, then copy the pipework file to the/usr/local/bin directory in the extracted directory, and the tool is installed.

2, create a virtual network card

Execute the following command to create a virtual network adapter for the specified address for three containers:

sudo pipework docker0 hadoop_master 172.17.42.2/24@172.17.42.1
sudo pipework docker0 hadoop_slave1 172.17.42.3/24@ 172.17.42.1
sudo pipework docker0 hadoop_slave2 172.17.42.4/24@172.17.42.1

The command parameter is:< bridge name > < container name > < new network card address/network number @ Gateway address, Gateway is the address of the Network bridge, the default created network card name is eth1, you can specify the network card name by <-i name> parameter. There may be a warning message when the command is executed, and no effect is currently found.


Eight, Access test

Ping the new network card address of the container in the host, find that all three nodes are tested by using the SSH command in the host and the container to be properly logged in.



Nine, start Hadoop cluster

After you log into the master node of the container, execute the command to format the Namenode node, and then start the cluster:

Bin/hadoop namenode-format
sbin/start-dfs.sh && sbin/start-yarn.sh

The results are as follows:



You can now access the container from the host. However, the host is not a desktop environment, only through the entity machine in the browser to access the cluster-provided web UI, but now the entity machine can not ping the container address, also cannot access the container, because the host is the physical machine subnet, container is the host subnet, The entity machine does not know the existence of the subnet where the container is located, so it is necessary to add a route from the entity machine to the container subnet in the entity machine.

Run cmd in admin mode and execute the following command:

Route add-p 172.17.42.0 Mask 255.255.255.0 192.168.92.129

Three addresses are:< destination subnet address > < mask address > < gateway/host address, after which, in the entity machine can ping the container normally, the Web console can also open the normal.


Description: After each reboot of the Docker container, the virtual NIC created by the pipework tool disappears and needs to be reassigned to write a script to manage it.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.