Build Hadoop fully distributed cluster based on virtual Linux+docker

Source: Internet
Author: User
Tags ssh server docker ps docker run

This article assumes that users have a basic understanding of Docker, Master Linux basic commands, and understand the general installation and simple configuration of Hadoop.

Lab Environment: windows10+vmware WorkStation 11+linux.14.04 server+docker 1.7

windows 10 as the physical machine operating system, the network segment is: 10.41.0.0/24, the virtual machine uses the NAT network, the subnet is the 192.168.92.0/ 24, the gateway is 192.168.92.2,linux 14.04 for the virtual system, as a host for the container, IP 192.168.92.129. This article builds a Hadoop fully distributed cluster based on the above environment in a Linux system, with nodes composed of master+slave1+slave2

One, virtual system installation

Install VMware Workstation on the WINDOWS10 and create a Linux virtual machine that allocates disk space, CPU, and memory based on the performance of the machine, and the network type is NAT (depending on the actual network environment, where the tutorials on the network are rich and no longer repeat), When installing Linux, choose an installation type of SSH server.

After installing the Linux virtual system, you can see in the VMware Workstation Network editor that the virtual machine is assigned a subnet of 192.168.92.0/[email protected]. In the virtual machine terminal execution Ifconfig can see the Linux automatically get to the IP for 192.168.92.129 Edit/etc/network/interfaces file, configure the static IP address:

Execute the command to make the network configuration effective:

sudo/etc/init.d/networking Restart #有时候该命令没什么效果, restart it.

use Linux remote management tools (such as Xshell, putty) to log on to the Linux system


Second, install Docker

Refer to the link to install docker:http://dockerpool.com/static/books/docker_practice/install/ubuntu.html


Third, get the mirror

Download the ubuntu14.04 image from the Docker repository (files are small, less than 200MB)

sudo docker pull ubuntu:14.04 #下载镜像sudo Docker images #查看本地仓库中的镜像


The bottom 14.04 is the downloaded image, and the other is the new image that was obtained on the basis of this image


Iv. Custom Containers

Execute the following command to create and start a container:

sudo docker run-ti ubuntu:14.04

This will switch to the temporary terminal of the container, by default directly into the root user environment, and then do the following:

1, modify the installation source, because the Docker pull the image used is a foreign source of installation, the installation source under the Apt-get install operation when the download phase will be particularly slow or even download is unsuccessful, the proposed replacement of domestic sources, such as cn99, NetEase, etc. You can also refer to the host's source configuration (it accesses Ubuntu deployed in the domestic server), to find a replacement for the/etc/apt/sources.list file.

2. Install SSH

sudo apt-get updatesudo apt-get install Openssh-server

3. Configure SSH password-free login

Under User directory, create the. ssh folder and execute the command:

Ssh-keygen-t Rsa-p "-F ~/.SSH/ID_RSACP ~/.ssh/id_rsa.pub ~/.ssh/authorized_keys

Then modify the SSH configuration file (/etc/ssh/sshd_config), change the Permitrootlogin Without-password to Permitrootlogin Yes, and then change the root password.

If you do not want to allow root to log in directly, you need to create a user yourself, and all actions are made in that user environment.
4. Download and install the JDK

sudo apt-get install Oracle-java8-installer #也可使用wget直接从oracle官网下载

after downloading, configure the JDK environment variable, execute Java, javac command to test.

5. Configure the Hosts

As described in the experimental environment above, the Hadoop cluster consists of a master node and two slave nodes that need to add IP-to-hostname mappings in the Hosts file. The ifconfig command can see that the IP of the eth0 NIC assigned by Docker to the container is 172.17.0.x segment (the private IP segment may be different in different environments), and the IP will change after the container is restarted, and the Hadoop cluster is preferably configured with a static address, and in the next steps, a tool will be used to give the container a virtual A new network card with a fixed address that is the same network segment as the address of the bridge Docker0 that the Docker service creates in the host.

Re-open a remote management Terminal program to log on to the Linux host, The execution of the ifconfig command can be to the IP of the Docker0 Bridge (the experimental environment in this paper is 172.17.42.1), and each time the system restarts, the IP will not change, so that the network segment can be assigned to the container IP, here to do the following allocation:

172.17.42.2 master172.17.42.3 slave1172.17.42.4 Slave2

Note that the container will initialize the hosts file after it is started, and will add a eth0 NIC address to the host name Mapping, which will cause the Hadoop cluster to listen to the network card after boot is eth0, so you need to regenerate the file, here is a simple script to do this work:

#!/bin/bashecho "#ip and hostname Information" >/etc/hostsecho "127.0.0.1 localhost" >>/etc/hostsecho "172.17. 42.2 master ">>/etc/hostsecho" 172.17.42.3 slave1 ">>/etc/hostsecho" 172.17.42.4 slave2 ">>/etc/ Hosts

Add the script to boot.

6. Download Configuration Hadoop

Use wget to download the hadoop2.6 (you can choose the version you want to install depending on your situation):

wget http://mirrors.sonic.net/apache/hadoop/common/hadoop-2.6.0/hadoop-2.6.0.tar.gz

unzip the package to a directory, and then edit the configuration file under the Hadoop-2.6.0/etc/hadoop directory to modify the appropriate configuration.

7. Save the image

Execute the exit command to exit the container terminal, and then execute the command to save changes to the container to a new mirror:

sudo docker commit-m "description information about the images" < containers id> Ubuntu:hadoop

8. Create and start a new container

Execute the command to create and start a new Hadoop cluster container:

sudo docker run-d-H master--name=hadoop_master ubuntu:hadoop/usr/sbin/sshd-d #启动hadoop主节点sudo Docker run-d-H slave1 --name=hadoop_slave1 ubuntu:hadoop/usr/sbin/sshd-d #启动hadoop从节点sudo Docker run-d-H slave2--name=hadoop_slave2 Ubuntu : hadoop/usr/sbin/sshd-d #启动hadoop从节点sudo Docker ps-a #查看所有创建的容器


9. Installing pipework Tools
Download a tool called Pipework from gitub, with the address: Https://github.com/jpetazzo/pipework

The main tool is to give the container virtual a new network card, the network card configuration static address, while the new network card and Docker in the host to create a bridge bridging, so that the container and host can communicate through the network card.

Download and unzip the tool, then copy the pipework file from the extracted directory to the/usr/local/bin directory, and the tool is installed.

10. Create a virtual network card

Execute the following command to create a virtual network card of the specified address for three containers:

sudo pipework Docker0 hadoop_master 172.17.42.2/[email protected]sudo pipework Docker0 hadoop_slave1 172.17.42.3/[email Protected]sudo pipework Docker0 hadoop_slave2 172.17.42.4/[email protected]

the command parameter is the:< bridge name > < container name > < new network card address/network number @ Gateway address, Gateway is the address of the bridge, the default network card created by the name is eth1, you can specify the network card name through the <-i name> parameter. There may be a warning message when the command is executed, and no impact is currently found.

11. Testing

Ping the container's new network card address in the host, discover that three nodes are tested, and use SSH commands in the host to correctly log in to the container.


12. Start the Hadoop cluster

After you log in to the master node of the container, execute the command to format the Namenode node, and then start the cluster:

Bin/hadoop namenode-formatsbin/start-dfs.sh && sbin/start-yarn.sh

The results are as follows:



At this point, the container can already be accessed from the host, but the host is not a desktop environment, only through the physical machine in the browser to access the Web UI provided by the cluster, but now the physical machine can not ping the container address, nor access the container, because the host is the subnet of the physical machine, the container is a host subnet, The physical machine does not know the existence of the subnet where the container resides, so a route from the physical machine to the container subnet needs to be added to the physical machine.

Run cmd in administrator mode and execute the following command:

Route add-p 172.17.42.0 Mask 255.255.255.0 192.168.92.129

three addresses are:< destination subnet address > < mask address > < gateway/host address, after which the container can be ping normally in the physical machine, and the Web console will open normally.


Description: After each reboot of the Docker container, the virtual network card created by the pipework tool disappears and needs to be reassigned to write a script management.

Copyright NOTICE: This article for Bo Master original article, without Bo Master permission not reproduced.

Build Hadoop fully distributed cluster based on virtual Linux+docker

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.