Deploying Hadoop on Docker

Last Update:2018-07-26 Source: Internet

Author: User

Tags xsl ssh docker run

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

first, build Docker image

1. mkdir Hadoop

2<span style= "font-family:arial, Helvetica, Sans-serif;" > Copy hadoop-2.6.2.tar.gz to a Hadoop file </span>

3. Vim Dockfile

From Ubuntu
maintainer Docker tianlei <393743083@qq.com>
ADD./hadoop-2.6.2.tar.gz/usr/local/</ Span>

Execute the command to generate the image:

Docker build-t "Ubuntu:base".

To run the mirror build container:

Docker run-d-it--name Hadoop ubuntu:hadoop

Proceed to The Mirror:

Docker Exec-i-T Hadoop/bin/bash

1. Install Java in the image

Sodu apt-get update
sudo apt-get install Openjdk-7-jre openjdk-7-jdk

Changing environment variables

Vim ~/.BASHRC

Join this line:

Export JAVA_HOME=/USR/LIB/JVM/JAVA-1.7.0-OPENJDK-AMD64

SOURCE ~/.BASHRC

2. Install Hadoop in the mirror

Since Hadop has been decompressed in/usr/local/

Vim ~/.BASHRC

Add to:

Export Hadoop_home=/usr/local/hadoop
export hadoop_config_home= $HADOOP _home/etc/hadoop
export path= $PATH: $HADOOP _home/bin
Export path= $PATH: $HADOOP _home/sbin

Generated

SOURCE ~/.BASHRC

modifying environment variables

cd/usr/local/hadoop/etc/hadoop/
Vim hadoop-env.sh

Modify

Export JAVA_HOME=/USR/LIB/JVM/JAVA-1.7.0-OPENJDK-AMD64

In Hadoop

TMP, Namenode, Datanode are established under the directory

There are three directories created here, and the following configuration will be used: tmp: As a temporary directory for Hadoop Namenode: As a namenode directory Datanode: As a datanode storage directory into the/etc directory to modify three XML

1). Core-site.xml Configuration

<?xml version= "1.0" encoding= "UTF-8"?> <?xml-stylesheet type= "text/xsl" href= "configuration.xsl"?> <!
  --Licensed under the Apache License, Version 2.0 (the "License");
  You are not a use of this file except in compliance with the License. Obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 unless required by applicable l AW or agreed to writing, software distributed under the License are distributed on a "as is" BASIS, without WARRANT
  IES or CONDITIONS of any KIND, either express OR implied. See the License for the specific language governing permissions and limitations under the License.
See accompanying LICENSE file. -<!--Put Site-specific property overrides in the this file. -<configuration> <property> <name>hadoop.tmp.dir</name> <val Ue>/root/soft/apache/hadoop/hadoop-2.6.0/tmp</value> <description>a Base for other temporary directories.</description> </property> <property> <name>fs.default.name</na Me> <value>hdfs://master:9000</value>bin/start-all.sh &LT;FINAL&GT;TRUE&LT;/FINAL&G
            T  <description>the name of the default file system.  A URI whose scheme and authority determine the FileSystem implementation. The URI ' s scheme determines the Config property (fs.  Scheme.impl) Naming the FileSystem implementation class. The URI ' s authority is used to determine the host, port, etc. for a filesystem.</description> </p Roperty> </configuration>

Note: The Hadoop.tmp.dir configuration item value is the temporary directory path created in the previous command. The Fs.default.name is configured as hdfs://master:9000, which points to a master node's host (the node is naturally configured when we do the cluster configuration, first written here) 2). Hdfs-site.xml Configuration

<?xml version= "1.0" encoding= "UTF-8"?> <?xml-stylesheet type= "text/xsl" href= "configuration.xsl"?> <!
  --Licensed under the Apache License, Version 2.0 (the "License");
  You are not a use of this file except in compliance with the License. Obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 unless required by applicable l AW or agreed to writing, software distributed under the License are distributed on a "as is" BASIS, without WARRANT
  IES or CONDITIONS of any KIND, either express OR implied. See the License for the specific language governing permissions and limitations under the License.
See accompanying LICENSE file. -<!--Put Site-specific property overrides in the this file. -<configuration> <property> <name>dfs.replication</name> <value>2
        </value> <final>true</final> <description>default block replication. TheActual number of replications can specified when the file is created.
        The default is used if replication isn't specified in Create time.
        </description> </property> <property> <name>dfs.namenode.name.dir</name>

    <value>/usr/local/hadoop/namenode</value> <final>true</final> </property> <property> <name>dfs.datanode.data.dir</name> <value>/usr/local/hadoop/datano de</value> <final>true</final> </property> </configuration>

Note: When we build the cluster environment, we will configure a master node and two slave nodes. So the dfs.replication is configured to 2. Dfs.namenode.name.dir and Dfs.datanode.data.dir are configured as directory path 3 for the previously created Namenode and Datanode respectively). Mapred-site.xml Configuration

<?xml version= "1.0"?> <?xml-stylesheet type= "text/xsl" href= "configuration.xsl"?> <!--Licensed under
  The Apache License, Version 2.0 (the "License");
  You are not a use of this file except in compliance with the License. Obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 unless required by applicable l AW or agreed to writing, software distributed under the License are distributed on a "as is" BASIS, without WARRANT
  IES or CONDITIONS of any KIND, either express OR implied. See the License for the specific language governing permissions and limitations under the License.
See accompanying LICENSE file. -<!--Put Site-specific property overrides in the this file. -<configuration> <property> <name>mapred.job.tracker</name> &LT;VALUE&G  T;master:9001</value> <description>the host and port that is the MapReduce job tracker runs at. If "Local", then Jobs is run in-process as a single map and reduce task. </description> </property> </configuration>

There is only one configuration item mapred.job.tracker, we point to the master node machine.
Formatting Namenode

Hadoop Namenode-format

3. Install SSH

sudo apt-get install SSH

Add in ~/.BASHRC

#autorun
/usr/sbin/sshd

Generate key

CD ~/
ssh-keygen-t rsa-p "-F ~/.ssh/id_dsa
CD. SSH
cat id_dsa.pub >> Authorized_keys

Note: Sometimes it is suggested that/var/run/sshd cannot be found, just create an sshd folder in run.

Go to/etc/ssh's ssh_config and add

Stricthostkeychecking no
userknownhostsfile/dev/null
</span>

4. Create an image that installs Hadoop

Docker commit-m "Hadoop install" Hadoop ubuntu:hadoop

Ii. deploying Hadoop distributed clusters

start the master container

Docker run-d-ti-h Master Ubuntu:hadoop

start the slave1 container

Docker run-d-ti-h slave1 Ubuntu:hadoop

start the Slave2 container

Docker run-d-ti-h slave2 Ubuntu:hadoop

Add in/etc/hosts

10.0.0.5        Master
10.0.0.6        slave1
10.0.0.7        slave2

Add in the/usr/local/hadoop/etc/hadoop/slaves file

Slave1
Slave2

Note: Due to insufficient virtual machine memory

Add in Mapred-site.xml

<property>
       <name>mapreduce.map.memory.mb</name>
       <value>500</value>
</property>

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More