Deploying Hadoop on Docker

Source: Internet
Author: User
Tags xsl ssh docker run

first, build Docker image

1. mkdir Hadoop
2<span style= "font-family:arial, Helvetica, Sans-serif;" > Copy hadoop-2.6.2.tar.gz to a Hadoop file </span>
3. Vim Dockfile
From Ubuntu
maintainer Docker tianlei <393743083@qq.com>
ADD./hadoop-2.6.2.tar.gz/usr/local/</ Span>
Execute the command to generate the image:

Docker build-t "Ubuntu:base".
To run the mirror build container:

Docker run-d-it--name Hadoop ubuntu:hadoop
Proceed to The Mirror:

Docker Exec-i-T Hadoop/bin/bash
1. Install Java in the image
Sodu apt-get update
sudo apt-get install Openjdk-7-jre openjdk-7-jdk
Changing environment variables

Vim ~/.BASHRC
Join this line:

Export JAVA_HOME=/USR/LIB/JVM/JAVA-1.7.0-OPENJDK-AMD64

SOURCE ~/.BASHRC

2. Install Hadoop in the mirror

Since Hadop has been decompressed in/usr/local/

Vim ~/.BASHRC
Add to:

Export Hadoop_home=/usr/local/hadoop
export hadoop_config_home= $HADOOP _home/etc/hadoop
export path= $PATH: $HADOOP _home/bin
Export path= $PATH: $HADOOP _home/sbin
Generated

SOURCE ~/.BASHRC
modifying environment variables

cd/usr/local/hadoop/etc/hadoop/
Vim hadoop-env.sh
Modify
Export JAVA_HOME=/USR/LIB/JVM/JAVA-1.7.0-OPENJDK-AMD64
In Hadoop

TMP, Namenode, Datanode are established under the directory

There are three directories created here, and the following configuration will be used: tmp: As a temporary directory for Hadoop Namenode: As a namenode directory Datanode: As a datanode storage directory into the/etc directory to modify three XML

1). Core-site.xml Configuration

<?xml version= "1.0" encoding= "UTF-8"?> <?xml-stylesheet type= "text/xsl" href= "configuration.xsl"?> <!
  --Licensed under the Apache License, Version 2.0 (the "License");
  You are not a use of this file except in compliance with the License. Obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 unless required by applicable l AW or agreed to writing, software distributed under the License are distributed on a "as is" BASIS, without WARRANT
  IES or CONDITIONS of any KIND, either express OR implied. See the License for the specific language governing permissions and limitations under the License.
See accompanying LICENSE file. -<!--Put Site-specific property overrides in the this file. -<configuration> <property> <name>hadoop.tmp.dir</name> <val Ue>/root/soft/apache/hadoop/hadoop-2.6.0/tmp</value> <description>a Base for other temporary directories.</description> </property> <property> <name>fs.default.name</na Me> <value>hdfs://master:9000</value>bin/start-all.sh &LT;FINAL&GT;TRUE&LT;/FINAL&G
            T  <description>the name of the default file system.  A URI whose scheme and authority determine the FileSystem implementation. The URI ' s scheme determines the Config property (fs.  Scheme.impl) Naming the FileSystem implementation class. The URI ' s authority is used to determine the host, port, etc. for a filesystem.</description> </p Roperty> </configuration>

Note: The Hadoop.tmp.dir configuration item value is the temporary directory path created in the previous command. The Fs.default.name is configured as hdfs://master:9000, which points to a master node's host (the node is naturally configured when we do the cluster configuration, first written here) 2). Hdfs-site.xml Configuration

<?xml version= "1.0" encoding= "UTF-8"?> <?xml-stylesheet type= "text/xsl" href= "configuration.xsl"?> <!
  --Licensed under the Apache License, Version 2.0 (the "License");
  You are not a use of this file except in compliance with the License. Obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 unless required by applicable l AW or agreed to writing, software distributed under the License are distributed on a "as is" BASIS, without WARRANT
  IES or CONDITIONS of any KIND, either express OR implied. See the License for the specific language governing permissions and limitations under the License.
See accompanying LICENSE file. -<!--Put Site-specific property overrides in the this file. -<configuration> <property> <name>dfs.replication</name> <value>2
        </value> <final>true</final> <description>default block replication. TheActual number of replications can specified when the file is created.
        The default is used if replication isn't specified in Create time.
        </description> </property> <property> <name>dfs.namenode.name.dir</name>

    <value>/usr/local/hadoop/namenode</value> <final>true</final> </property> <property> <name>dfs.datanode.data.dir</name> <value>/usr/local/hadoop/datano de</value> <final>true</final> </property> </configuration>

Note: When we build the cluster environment, we will configure a master node and two slave nodes. So the dfs.replication is configured to 2. Dfs.namenode.name.dir and Dfs.datanode.data.dir are configured as directory path 3 for the previously created Namenode and Datanode respectively). Mapred-site.xml Configuration

<?xml version= "1.0"?> <?xml-stylesheet type= "text/xsl" href= "configuration.xsl"?> <!--Licensed under
  The Apache License, Version 2.0 (the "License");
  You are not a use of this file except in compliance with the License. Obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 unless required by applicable l AW or agreed to writing, software distributed under the License are distributed on a "as is" BASIS, without WARRANT
  IES or CONDITIONS of any KIND, either express OR implied. See the License for the specific language governing permissions and limitations under the License.
See accompanying LICENSE file. -<!--Put Site-specific property overrides in the this file. -<configuration> <property> <name>mapred.job.tracker</name> &LT;VALUE&G  T;master:9001</value> <description>the host and port that is the MapReduce job tracker runs at. If "Local", then Jobs is run in-process as a single map and reduce task. </description> </property> </configuration>
There is only one configuration item mapred.job.tracker, we point to the master node machine.
Formatting Namenode

Hadoop Namenode-format


3. Install SSH

sudo apt-get install SSH
Add in ~/.BASHRC

#autorun
/usr/sbin/sshd
Generate key

CD ~/
ssh-keygen-t rsa-p "-F ~/.ssh/id_dsa
CD. SSH
cat id_dsa.pub >> Authorized_keys
Note: Sometimes it is suggested that/var/run/sshd cannot be found, just create an sshd folder in run.

Go to/etc/ssh's ssh_config and add

Stricthostkeychecking no
userknownhostsfile/dev/null
</span>
4. Create an image that installs Hadoop

Docker commit-m "Hadoop install" Hadoop ubuntu:hadoop


Ii. deploying Hadoop distributed clusters

start the master container

Docker run-d-ti-h Master Ubuntu:hadoop
start the slave1 container
Docker run-d-ti-h slave1 Ubuntu:hadoop
start the Slave2 container
Docker run-d-ti-h slave2 Ubuntu:hadoop
Add in/etc/hosts

10.0.0.5        Master
10.0.0.6        slave1
10.0.0.7        slave2
Add in the/usr/local/hadoop/etc/hadoop/slaves file

Slave1
Slave2
Note: Due to insufficient virtual machine memory

Add in Mapred-site.xml

<property>
       <name>mapreduce.map.memory.mb</name>
       <value>500</value>
</property>

















Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.