在docker上部署Hadoop

來源:互聯網
上載者:User

一、構建docker鏡像

1、 mkdir hadoop
2<span style="font-family: Arial, Helvetica, sans-serif;">、將hadoop-2.6.2.tar.gz複製到hadoop檔案中</span>
3、vim Dockfile
FROM ubuntuMAINTAINER Docker tianlei <393743083@qq.com>ADD ./hadoop-2.6.2.tar.gz /usr/local/</span>
執行命令產生鏡像:

docker build -t "ubuntu:base" .
運行鏡像產生容器:

docker run -d -it --name hadoop ubuntu:hadoop
進入到鏡像中進行操作:

docker exec -i -t hadoop /bin/bash
1、鏡像中安裝java
sodu apt-get updatesudo apt-get install openjdk-7-jre openjdk-7-jdk
更改環境變數

vim ~/.bashrc
加入此行:

export JAVA_HOME=/usr/lib/jvm/java-1.7.0-openjdk-amd64

source ~/.bashrc

2、鏡像中安裝Hadoop

由於Hadop已經解壓縮在/usr/local/中

vim ~/.bashrc
添加:

export HADOOP_HOME=/usr/local/hadoopexport HADOOP_CONFIG_HOME=$HADOOP_HOME/etc/hadoopexport PATH=$PATH:$HADOOP_HOME/binexport PATH=$PATH:$HADOOP_HOME/sbin
產生

source ~/.bashrc
修改環境變數

cd /usr/local/hadoop/etc/hadoop/vim hadoop-env.sh
修改
export JAVA_HOME=/usr/lib/jvm/java-1.7.0-openjdk-amd64
在hadoop

目錄下建立tmp、namenode、datanode

這裡建立了三個目錄,後續配置的時候會用到: tmp:作為Hadoop的臨時目錄 namenode:作為NameNode的存放目錄 datanode:作為DataNode的存放目錄 進入到/etc目錄下修改三個xml

1).core-site.xml配置

<?xml version="1.0" encoding="UTF-8"?><?xml-stylesheet type="text/xsl" href="configuration.xsl"?><!--  Licensed under the Apache License, Version 2.0 (the "License");  you may not use this file except in compliance with the License.  You may obtain a copy of the License at    http://www.apache.org/licenses/LICENSE-2.0  Unless required by applicable law or agreed to in writing, software  distributed under the License is distributed on an "AS IS" BASIS,  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.  See the License for the specific language governing permissions and  limitations under the License. See accompanying LICENSE file.--><!-- Put site-specific property overrides in this file. --><configuration>    <property>            <name>hadoop.tmp.dir</name>            <value>/root/soft/apache/hadoop/hadoop-2.6.0/tmp</value>            <description>A base for other temporary directories.</description>    </property>    <property>            <name>fs.default.name</name>            <value>hdfs://master:9000</value>bin/start-all.sh            <final>true</final>            <description>The name of the default file system.  A URI whose            scheme and authority determine the FileSystem implementation.  The            uri's scheme determines the config property (fs.SCHEME.impl) naming            the FileSystem implementation class.  The uri's authority is used to            determine the host, port, etc. for a filesystem.</description>    </property></configuration>

注意: hadoop.tmp.dir配置項值即為此前命令中建立的臨時目錄路徑。 fs.default.name配置為hdfs://master:9000,指向的是一個Master節點的主機(後續我們做叢集配置的時候,自然會配置這個節點,先寫在這裡) 2).hdfs-site.xml配置

<?xml version="1.0" encoding="UTF-8"?><?xml-stylesheet type="text/xsl" href="configuration.xsl"?><!--  Licensed under the Apache License, Version 2.0 (the "License");  you may not use this file except in compliance with the License.  You may obtain a copy of the License at    http://www.apache.org/licenses/LICENSE-2.0  Unless required by applicable law or agreed to in writing, software  distributed under the License is distributed on an "AS IS" BASIS,  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.  See the License for the specific language governing permissions and  limitations under the License. See accompanying LICENSE file.--><!-- Put site-specific property overrides in this file. --><configuration>    <property>        <name>dfs.replication</name>        <value>2</value>        <final>true</final>        <description>Default block replication.        The actual number of replications can be specified when the file is created.        The default is used if replication is not specified in create time.        </description>    </property>    <property>        <name>dfs.namenode.name.dir</name>        <value>/usr/local/hadoop/namenode</value>        <final>true</final>    </property>    <property>        <name>dfs.datanode.data.dir</name>        <value>/usr/local/hadoop/datanode</value>        <final>true</final>    </property></configuration>

注意: 我們後續搭建叢集環境時,將配置一個Master節點和兩個Slave節點。所以dfs.replication配置為2。 dfs.namenode.name.dir和dfs.datanode.data.dir分別配置為之前建立的NameNode和DataNode的目錄路徑 3).mapred-site.xml配置

<?xml version="1.0"?><?xml-stylesheet type="text/xsl" href="configuration.xsl"?><!--  Licensed under the Apache License, Version 2.0 (the "License");  you may not use this file except in compliance with the License.  You may obtain a copy of the License at    http://www.apache.org/licenses/LICENSE-2.0  Unless required by applicable law or agreed to in writing, software  distributed under the License is distributed on an "AS IS" BASIS,  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.  See the License for the specific language governing permissions and  limitations under the License. See accompanying LICENSE file.--><!-- Put site-specific property overrides in this file. --><configuration>    <property>        <name>mapred.job.tracker</name>        <value>master:9001</value>        <description>The host and port that the MapReduce job tracker runs        at.  If "local", then jobs are run in-process as a single map        and reduce task.        </description>    </property></configuration>
這裡只有一個配置項mapred.job.tracker,我們指向master節點機器。
格式化namenode

hadoop namenode -format


3、安裝ssh

sudo apt-get install ssh
在~/.bashrc中添加

#autorun/usr/sbin/sshd
產生密鑰

cd ~/ssh-keygen -t rsa -P '' -f ~/.ssh/id_dsacd .sshcat id_dsa.pub >> authorized_keys
註:有時候會提示/var/run/sshd找不到,只要在run中建立一個sshd檔案夾就行

進入到/etc/ssh的ssh_config中,添加

StrictHostKeyChecking noUserKnownHostsFile /dev/null</span>
4、產生安裝hadoop的鏡像

docker commit -m "hadoop install" hadoop ubuntu:hadoop


二、部署Hadoop分布式叢集

啟動master容器

docker run -d -ti -h master ubuntu:hadoop
啟動slave1容器
docker run -d -ti -h slave1 ubuntu:hadoop
啟動slave2容器
docker run -d -ti -h slave2 ubuntu:hadoop
在/etc/hosts中添加

10.0.0.5        master10.0.0.6        slave110.0.0.7        slave2
在/usr/local/hadoop/etc/hadoop/slaves檔案中添加

slave1slave2
註:由於虛擬機器記憶體不夠

mapred-site.xml中添加

<property>       <name>mapreduce.map.memory.mb</name>       <value>500</value></property>
















   

聯繫我們

該頁面正文內容均來源於網絡整理,並不代表阿里雲官方的觀點,該頁面所提到的產品和服務也與阿里云無關,如果該頁面內容對您造成了困擾,歡迎寫郵件給我們,收到郵件我們將在5個工作日內處理。

如果您發現本社區中有涉嫌抄襲的內容,歡迎發送郵件至: info-contact@alibabacloud.com 進行舉報並提供相關證據,工作人員會在 5 個工作天內聯絡您,一經查實,本站將立刻刪除涉嫌侵權內容。

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.