first, build Docker image
1. mkdir Hadoop
2<span style= "font-family:arial, Helvetica, Sans-serif;" > Copy hadoop-2.6.2.tar.gz to a Hadoop file </span>
3. Vim Dockfile
From Ubuntu
maintainer Docker tianlei <393743083@qq.com>
ADD./hadoop-2.6.2.tar.gz/usr/local/</ Span>
Execute the command to generate the image:
Docker build-t "Ubuntu:base".
To run the mirror build container:
Docker run-d-it--name Hadoop ubuntu:hadoop
Proceed to The Mirror:
Docker Exec-i-T Hadoop/bin/bash
1. Install Java in the image
Sodu apt-get update
sudo apt-get install Openjdk-7-jre openjdk-7-jdk
Changing environment variables
Vim ~/.BASHRC
Join this line:
Export JAVA_HOME=/USR/LIB/JVM/JAVA-1.7.0-OPENJDK-AMD64
SOURCE ~/.BASHRC
2. Install Hadoop in the mirror
Since Hadop has been decompressed in/usr/local/
Vim ~/.BASHRC
Add to:
Export Hadoop_home=/usr/local/hadoop
export hadoop_config_home= $HADOOP _home/etc/hadoop
export path= $PATH: $HADOOP _home/bin
Export path= $PATH: $HADOOP _home/sbin
Generated
SOURCE ~/.BASHRC
modifying environment variables
cd/usr/local/hadoop/etc/hadoop/
Vim hadoop-env.sh
Modify
Export JAVA_HOME=/USR/LIB/JVM/JAVA-1.7.0-OPENJDK-AMD64
In Hadoop
TMP, Namenode, Datanode are established under the directory
There are three directories created here, and the following configuration will be used: tmp: As a temporary directory for Hadoop Namenode: As a namenode directory Datanode: As a datanode storage directory into the/etc directory to modify three XML
1). Core-site.xml Configuration
<?xml version= "1.0" encoding= "UTF-8"?> <?xml-stylesheet type= "text/xsl" href= "configuration.xsl"?> <!
--Licensed under the Apache License, Version 2.0 (the "License");
You are not a use of this file except in compliance with the License. Obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 unless required by applicable l AW or agreed to writing, software distributed under the License are distributed on a "as is" BASIS, without WARRANT
IES or CONDITIONS of any KIND, either express OR implied. See the License for the specific language governing permissions and limitations under the License.
See accompanying LICENSE file. -<!--Put Site-specific property overrides in the this file. -<configuration> <property> <name>hadoop.tmp.dir</name> <val Ue>/root/soft/apache/hadoop/hadoop-2.6.0/tmp</value> <description>a Base for other temporary directories.</description> </property> <property> <name>fs.default.name</na Me> <value>hdfs://master:9000</value>bin/start-all.sh <FINAL>TRUE</FINAL&G
T <description>the name of the default file system. A URI whose scheme and authority determine the FileSystem implementation. The URI ' s scheme determines the Config property (fs. Scheme.impl) Naming the FileSystem implementation class. The URI ' s authority is used to determine the host, port, etc. for a filesystem.</description> </p Roperty> </configuration>
Note: The Hadoop.tmp.dir configuration item value is the temporary directory path created in the previous command. The Fs.default.name is configured as hdfs://master:9000, which points to a master node's host (the node is naturally configured when we do the cluster configuration, first written here) 2). Hdfs-site.xml Configuration
<?xml version= "1.0" encoding= "UTF-8"?> <?xml-stylesheet type= "text/xsl" href= "configuration.xsl"?> <!
--Licensed under the Apache License, Version 2.0 (the "License");
You are not a use of this file except in compliance with the License. Obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 unless required by applicable l AW or agreed to writing, software distributed under the License are distributed on a "as is" BASIS, without WARRANT
IES or CONDITIONS of any KIND, either express OR implied. See the License for the specific language governing permissions and limitations under the License.
See accompanying LICENSE file. -<!--Put Site-specific property overrides in the this file. -<configuration> <property> <name>dfs.replication</name> <value>2
</value> <final>true</final> <description>default block replication. TheActual number of replications can specified when the file is created.
The default is used if replication isn't specified in Create time.
</description> </property> <property> <name>dfs.namenode.name.dir</name>
<value>/usr/local/hadoop/namenode</value> <final>true</final> </property> <property> <name>dfs.datanode.data.dir</name> <value>/usr/local/hadoop/datano de</value> <final>true</final> </property> </configuration>
Note: When we build the cluster environment, we will configure a master node and two slave nodes. So the dfs.replication is configured to 2. Dfs.namenode.name.dir and Dfs.datanode.data.dir are configured as directory path 3 for the previously created Namenode and Datanode respectively). Mapred-site.xml Configuration
<?xml version= "1.0"?> <?xml-stylesheet type= "text/xsl" href= "configuration.xsl"?> <!--Licensed under
The Apache License, Version 2.0 (the "License");
You are not a use of this file except in compliance with the License. Obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 unless required by applicable l AW or agreed to writing, software distributed under the License are distributed on a "as is" BASIS, without WARRANT
IES or CONDITIONS of any KIND, either express OR implied. See the License for the specific language governing permissions and limitations under the License.
See accompanying LICENSE file. -<!--Put Site-specific property overrides in the this file. -<configuration> <property> <name>mapred.job.tracker</name> <VALUE&G T;master:9001</value> <description>the host and port that is the MapReduce job tracker runs at. If "Local", then Jobs is run in-process as a single map and reduce task. </description> </property> </configuration>
There is only one configuration item mapred.job.tracker, we point to the master node machine.
Formatting Namenode
Hadoop Namenode-format
3. Install SSH
sudo apt-get install SSH
Add in ~/.BASHRC
#autorun
/usr/sbin/sshd
Generate key
CD ~/
ssh-keygen-t rsa-p "-F ~/.ssh/id_dsa
CD. SSH
cat id_dsa.pub >> Authorized_keys
Note: Sometimes it is suggested that/var/run/sshd cannot be found, just create an sshd folder in run.
Go to/etc/ssh's ssh_config and add
Stricthostkeychecking no
userknownhostsfile/dev/null
</span>
4. Create an image that installs Hadoop
Docker commit-m "Hadoop install" Hadoop ubuntu:hadoop
Ii. deploying Hadoop distributed clusters
start the master container
Docker run-d-ti-h Master Ubuntu:hadoop
start the slave1 container
Docker run-d-ti-h slave1 Ubuntu:hadoop
start the Slave2 container
Docker run-d-ti-h slave2 Ubuntu:hadoop
Add in/etc/hosts
10.0.0.5 Master
10.0.0.6 slave1
10.0.0.7 slave2
Add in the/usr/local/hadoop/etc/hadoop/slaves file
Slave1
Slave2
Note: Due to insufficient virtual machine memory
Add in Mapred-site.xml
<property>
<name>mapreduce.map.memory.mb</name>
<value>500</value>
</property>