The Hadoop and HBase clusters have one NameNode and seven DataNode.
1./etc/hostname File
NameNode:
Node1
DataNode 1:
Node2
DataNode 2:
Node3
.......
DataNode 7:
Node8
2./etc/hosts file
NameNode:
127.0.0.1localhost#127.0.1.1node1#-------edit by HY(2014-05-04)--------#127.0.1.1node1125.216.241.113 node1125.216.241.112 node2125.216.241.96 node3125.216.241.111 node4125.216.241.114 node5125.216.241.115 node6125.216.241.116 node7125.216.241.117 node8#-------end edit--------# The following lines are desirable for IPv6 capable hosts::1 ip6-localhost ip6-loopbackfe00::0 ip6-localnetff00::0 ip6-mcastprefixff02::1 ip6-allnodesff02::2 ip6-allrouters
DataNode 1:
127.0.0.1localhost#127.0.0.1node2#127.0.1.1node2#--------eidt by HY(2014-05-04)--------125.216.241.113 node1125.216.241.112 node2125.216.241.96 node3125.216.241.111 node4125.216.241.114 node5125.216.241.115 node6125.216.241.116 node7125.216.241.117 node8#-------end eidt---------# The following lines are desirable for IPv6 capable hosts::1 ip6-localhost ip6-loopbackfe00::0 ip6-localnetff00::0 ip6-mcastprefixff02::1 ip6-allnodesff02::2 ip6-allrouters
The other DataNode is similar, but the hostname should be the same as the domain name in hosts. If it is different, some strange problems may occur when running tasks on the cluster.
3. comment in the hadoop-env.sh
# Export JAVA_HOME =/usr/lib/j2sdk1.5-sun
Add
JAVA_HOME =/usr/lib/jvm/java-6-sun
4. core-site.xml
<?xml version="1.0"?><?xml-stylesheet type="text/xsl" href="configuration.xsl"?><!-- Put site-specific property overrides in this file. --><configuration><property> <name>fs.default.name</name> <value>hdfs://node1:49000</value> </property> <property> <name>hadoop.tmp.dir</name> <value>/home/hadoop/newdata/hadoop-1.2.1/tmp</value> </property> <property><name>io.compression.codecs</name><value>org.apache.hadoop.io.compress.DefaultCodec,org.apache.hadoop.io.compress.GzipCodec,org.apache.hadoop.io.compress.BZip2Codec,com.hadoop.compression.lzo.LzoCodec,com.hadoop.compression.lzo.LzopCodec</value></property><property><name>io.compression.codec.lzo.class</name><value>com.hadoop.compression.lzo.LzoCodec</value></property> <property> <name>dfs.datanode.socket.write.timeout</name> <value>3000000</value> </property> <property> <name>dfs.socket.timeout</name> <value>3000000</value> </property></configuration>
5. hdfs-site.xml
<? Xml version = "1.0"?> <? Xml-stylesheet type = "text/xsl" href = "configuration. xsl"?> <! -- Put site-specific property overrides in this file. --> <configuration> <property> <name> dfs. name. dir </name> <value>/home/hadoop/newdata/hadoop-1.2.1/name1, /home/hadoop/newdata/hadoop-1.2.1/name2 </value> <description> Data Metadata storage location </description> </property> <name> dfs. data. dir </name> <value>/home/hadoop/newdata/hadoop-1.2.1/data1,/home/hadoop/newdata/hadoop-1.2.1/data2 </value> <description> Data Block Storage Storage location </description> </property> <name> dfs. replication </name> <! -- Backup two copies here --> <value> 2 </value> </property> </configuration>
6. mapred-site.xml
<? Xml version = "1.0"?> <? Xml-stylesheet type = "text/xsl" href = "configuration. xsl"?> <! -- Put site-specific property overrides in this file. --> <configuration> <property> <name> mapred. job. tracker </name> <value> node1: 49001 </value> </property> <name> mapred. local. dir </name> <value>/home/hadoop/newdata/hadoop-1.2.1/tmp </value> </property> <name> mapred. compress. map. output </name> <value> true </value> <! -- Compression is enabled for intermediate map and reduce output files by default --> </property> <name> mapred. map. output. compression. codec </name> <value> com. hadoop. compression. lzo. lzoCodec </value> <! -- Use the Lzo Library as the compression algorithm --> </property> </configuration>
7. masters
node1
8. slaves
node2node3node4node5node6node7node8
9. In hbase-env.sh
Add
JAVA_HOME =/usr/lib/jvm/java-6-sun
Enable export HBASE_MANAGES_ZK = true // true indicates that Zookeeper is used. If an independent Zookeeper is required, set it to false and install Zookeeper.
Hbase-site.xml
<? Xml version = "1.0"?> <? Xml-stylesheet type = "text/xsl" href = "configuration. xsl"?> <! --/***** Licensed to the Apache Software Foundation (ASF) under one * or more contributor license agreements. see the NOTICE file * distributed with this work for additional information * regarding copyright ownership. the ASF licenses this file * to you under the Apache License, Version 2.0 (the * "License"); you may not use this file except T in compliance * with the License. you may obtain a copy of the License at ** http://www.apache.org/licenses/LICENSE-2.0 ** Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "as is" BASIS, * without warranties or conditions of any kind, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. */--> <configuration> <property> <name> hbase. rootdir </name> <value> hdfs: // node1: 49000/hbase </value> <description> The directory shared by RegionServers. </description> </property> <name> hbase. cluster. distributed </name> <value> true </value> <description> The mode the cluster will be in. possible values are false: standalone and pseudo-distributed setups with managed Zookeeper true: fully-distributed with unmanaged Zookeeper Quorum (see hbase-env.sh) </description> </property> <name> hbase. master </name> <value> node1: 60000 </value> <description> </property> <name> hbase. tmp. dir </name> <value>/home/hadoop/newdata/hbase/tmp </value> <description> Temporary directory on the local filesystem. change this setting to point to a location more permanent than '/tmp', the usual resolve for java. io. tmpdir, as the '/tmp' directory is cleared on machine restart. default: $ {java. io. tmpdir}/hbase-$ {user. name} </description> </property> <name> hbase. zookeeper. quorum </name> <value> node2, node3, node4, node5, node6, node7, node8 </value> <description> the number of unique instances is required, comma separated list of servers in the ZooKeeper ensemble (This config. shocould have been named hbase. zookeeper. ensemble ). for example, "host1.mydomain.com, host2.mydomain.com, host3.mydomain.com ". by default this is set to localhost for local and pseudo-distributed modes of operation. for a fully-distributed setup, this shoshould be set to a full list of ZooKeeper ensemble servers. if HBASE_MANAGES_ZK is set in hbase-env.sh this is the list of servers which hbase will start/stop ZooKeeper on as part of cluster start/stop. client-side, we will take this list of ensemble members and put it together with the hbase. zookeeper. clientPort config. and pass it into zookeeper constructor as the connectString parameter. default: localhost </description> </property> <name> hbase. zookeeper. property. dataDir </name> <value>/home/hadoop/newdata/zookeeper </value> <description> Property from ZooKeeper's config zoo. cfg. the directory where the snapshot is stored. default: $ {hbase. tmp. dir}/zookeeper </description> </property> <name> </name> <value> </property> </configuration>
11. regionservers
node2node3node4node5node6node7node8
The configuration of each machine must be the same