標籤:
1.角色指派
| IP |
Role |
Hostname |
| 192.168.18.37 |
Master/NameNode/JobTracker |
HDP1 |
| 192.168.18.35 |
Slave/DataNode/TaskTracker |
HDP2 |
| 192.168.18.36 |
Slave/DataNode/TaskTracker |
HDP3 |
2. 分別安裝JDKmkdir -p /usr/local/setup #安裝JDK cd /usr/libtar -xvzf /usr/local/setup/jdk-7u75-linux-x64.tar.gz#改名為jdk7,純屬個人偏好 mv jdk1.7.0_75 jdk7 #增加JAVA環境變數 vi /etc/profile在profile檔案末尾,增加如下行:export JAVA_HOME=/usr/local/lib/jdk7export CLASSPATH=.:$JAVA_HOME/lib:$CLASSPATHexport PATH=$PATH:$JAVA_HOME/bin #修改jdk7的檔案的相關許可權 chown -R root:root jdk7 chmod -R 755 jdk7 #source修改後profile檔案 source /etc/profile #測試JAVA安裝 java -versionjava version "1.7.0_75"Java(TM) SE Runtime Environment (build 1.7.0_75-b13)Java HotSpot(TM) 64-Bit Server VM (build 24.75-b04, mixed mode)
3. 分別修改 /etc/sysconfig/network和/etc/hosts /etc/hosts這個就是指定IP和主機名稱的對應關係,/etc/sysconfig/network這個是指定機器的主機名稱。/etc/hosts修改:127.0.0.1 localhost localhost4 localhost4.localdomain4192.168.18.37 HDP1192.168.18.35 HDP2192.168.18.36 HDP3 /etc/sysconfig/network修改:HOSTNAME=原生機器名
4. 配置HDP1無密碼SSH訪問HDP2和HDP3
4.1 配置HDP1本地無密碼SSH#HDP1切到hdp使用者配置key。ssh-keygen -t dsa -P ‘‘ -f ~/.ssh/id_dsacat ~/.ssh/id_dsa.pub >>~/.ssh/authorized_keys #修改sshd_configsudo vi /etc/ssh/sshd_config #刪除#號,使如下三行的配置生效RSAAuthentication yesPubkeyAuthentication yesAuthorizedKeysFile .ssh/authorized_keys #配置許可權並重啟sshd服務cd ~/.sshchmod 600 authorized_keys cd ..chmod -R 700 .sshsudo service sshd restart
4.2 配置HDP1到HDP2和HDP3的無密碼SSH#將HDP1的authorized_keys複製到HDP2和HDP3scp .ssh/authorized_keys hdp2:~/.ssh/authorized_keys_hdp1scp .ssh/authorized_keys hdp3:~/.ssh/authorized_keys_hdp1 #分別在HDP2和HDP3上將authorized_keys_hdp1加入到本地的authorized_keys中cat ~/.ssh/authorized_keys_hdp1 >> ~/.ssh/authorized_keys #測試ssh localhostssh hdp2ssh hdp3Last login: Thu Apr 2 15:22:03 2015 from hdp1
5. 配置三台機的Hadoop檔案首先在Master(HDP1)配置,配置完成後將設定檔複製到Slaves上覆蓋。如果有相關的目錄,也需要在Slaves建立之。也可以在配置完成後,將整個hadoop安裝目錄複寫到Slaves,並做為安裝目錄。
在Hadoop安裝目錄新增如下檔案夾:mkdir dfs dfs/name dfs/data tmpdfs:用於hdfs的目錄dfs/name:hdfs的NameNode目錄dfs/data:hdfs的DataNode目錄tmp:hdfs的臨時檔案的目錄
/etc/profileexport HADOOP_PREFIX=/usr/local/hadoopHadoop安裝目錄的環境變數
etc/hadoop/hadoop-env.shexport JAVA_HOME=${JAVA_HOME}export HADOOP_PREFIX=/usr/local/hadoopexport HADOOP_LOG_DIR=/var/log/hadoopHadoop deamon的獨立環境變數
etc/hadoop/yarn-env.shexport JAVA_HOME=${JAVA_HOME}yarn的獨立環境變數
etc/hadoop/slaves,添加Slave機器名HDP2HDP3
etc/hadoop/core-site.xml<configuration> <property> <name>hadoop.tmp.dir</name> <value>/usr/local/hadoop/tmp</value> <description>Abase for other temporary directories.</description> </property> <property> <name>fs.default.name</name> <value>hdfs://hdp1:9000</value> </property> <property> <name>io.file.buffer.size</name> <value>4096</value> </property></configuration>
etc/hadoop/hdfs-site.xml<configuration> <property> <name>dfs.namenode.secondary.http-address</name> <value>HDP1:9001</value> </property> <property> <name>dfs.namenode.name.dir</name> <value>file:/usr/local/hadoop/dfs/name</value> </property> <property> <name>dfs.datanode.data.dir</name> <value>file:/usr/local/hadoop/dfs/data</value> </property> <property> <name>dfs.replication</name> <value>2</value> </property> <property> <name>dfs.nameservices</name> <value>hadoop-cluster1</value> </property> <property> <name>dfs.webhdfs.enabled</name> <value>true</value> </property></configuration>
etc/hadoop/mapred-site.xml<configuration> <property> <name>mapreduce.framework.name</name> <value>yarn</value> </property> <property> <name>mapreduce.jobhistory.address</name> <value>HDP1:10020</value> </property> <property> <name>mapreduce.jobhistory.webapp.address</name> <value>HDP1:19888</value> </property></configuration>
etc/hadoop/yarn-site.xml<configuration> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> <property> <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name> <value>org.apache.hadoop.mapred.ShuffleHandler</value> </property> <property> <name>yarn.resourcemanager.address</name> <value>HDP1:8032</value> </property> <property> <name>yarn.resourcemanager.scheduler.address</name> <value>HDP1:8030</value> </property> <property> <name>yarn.resourcemanager.resource-tracker.address</name> <value>HDP1:8031</value> </property> <property> <name>yarn.resourcemanager.admin.address</name> <value>HDP1:8033</value> </property> <property> <name>yarn.resourcemanager.webapp.address</name> <value>HDP1:8088</value> </property></configuration> #將配置好的設定檔複製到Slaves我選擇複製所有設定檔。先複製到對應的Home目錄,然後再覆蓋到Hadoop安裝目錄,防止許可權改變。sudo scp -r /usr/local/hadoop/etc/hadoop [email protected]:~/sudo scp -r /usr/local/hadoop/etc/hadoop [email protected]:~/ #SSH到對應的Slave,然後覆蓋etc/hadoop。我使用先刪除後覆蓋的方式。rm -rf /usr/local/hadoop/etc/hadoop/*mv ~/hadoop/* /usr/local/hadoop/etc/hadoop/ 6. 添加Hadoop環境變數 方便調用hadoop/bin和hadoop/sbin中的命令和指令碼,不用每次都輸入絕對路徑。 vi /etc/profile export PATH=$PATH:/usr/local/hadoop/bin:/usr/local/hadoop/sbin 重新source之source /etc/profile
7.啟動驗證 #格式化NameNode hdfs namenode -format #啟動hdfs start-hdfs.sh 啟動後HDP1上會有NameNode和SecondaryNameNode進程: [[email protected] root]$ jps 2991 NameNode 3172 SecondaryNameNode 8730 Jps Slaves上會有DataNode進程: [[email protected] root]$ jps2131 DataNode4651 Jps #啟動yarnstart-yarn.sh啟動後,HDP1上會增加ResourceManager進程,Slaves上會增加NodeManager進程。同樣可以用JPS觀察。
8. 運行內建的WordCount樣本
#建立一個要分析的txtvi /usr/local/hadoop/wc.txtthis is a wordcount appis a wordcount appa wordcount appwordcount appapp #在hdfs建立相關目錄並上傳wc.txthdfs dfs -mkdir -p /wc/input hdfs dfs -put wc.txt /wc/input/ #運行之hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.0.jar wordcount /wc/input/wc.txt /wc/output #查看結果hdfs dfs -ls /wc/outputhdfs dfs -cat /wc/output/part-r-00000
CentOS 6+Hadoop 2.6.0分布式叢集安裝