Node distribution status
Hostname |
Ip |
Zookeeper |
Namenode |
Datanode |
Journalnode |
ResourceManager |
Node1 |
192.168.139.137 |
|
Y |
|
Y |
Y |
Node2 |
192.168.139.138 |
|
Y |
|
Y |
Y |
Node3 |
192.168.139.139 |
|
|
Y |
Y |
|
Node4 |
192.168.139.140 |
Y |
|
Y |
|
|
Node5 |
192.168.139.141 |
Y |
|
Y |
|
|
Node6 |
192.168.139.142 |
Y |
|
|
|
|
Installation and deployment of zookeeper
1. Download and Unzip zookeeper
2. Create zoo.cfg vim in the Conf directory of zookeeper./zoo.cfg
3. Increase in Zoo.cfg
ticktime=2000
Datadir=/opt/zookeeperdata
clientport=2181
Initlimit=5
synclimit=2
server.1=node1:2888:3888
server.2=node2:2888:3888
server.3=node3:2888:3888
4, in the configuration of each node DataDir directory created myID file to add the corresponding server after the number
5. Configuring Zookeeper Environment variables
6, start zookeeperzkserver.sh start three
The following are HADOOP2. Deployment of X
Edit Hdfs-site.xml Add the following configuration
<configuration>
<property>
<name>dfs.nameservices</name>
<value>albert</value>
</property>
<property>
<name>dfs.ha.namenodes.albert</name>
<value>nn1,nn2</value>
</property>
<property>
<name>dfs.namenode.rpc-address.albert.nn1</name>
<value>node1:8020</value>
</property>
<property>
<name>dfs.namenode.rpc-address.albert.nn2</name>
<value>node2:8020</value>
</property>
<property>
<name>dfs.namenode.http-address.albert.nn1</name>
<value>node1:50070</value>
</property>
<property>
<name>dfs.namenode.http-address.albert.nn2</name>
<value>node2:50070</value>
</property>
<property>
<name>dfs.namenode.shared.edits.dir</name>
<value>qjournal://node1:8485;node2:8485;node3:8485/albert</value>
</property>
<property>
<name>dfs.client.failover.proxy.provider.albert</name>
<value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
</property>
<property>--If the private key is configured here, the connection is denied
<name>dfs.ha.fencing.methods</name>
<value>shell (/bin/true) </value>
</property>
<property>
<name>dfs.journalnode.edits.dir</name>
<value>/opt/journalnode</value>
</property>
<property>
<name>dfs.ha.automatic-failover.enabled</name>
<value>true</value>
</property>
</configuration>
<!--do not check HDFs file permissions--
<property>
<name>dfs.permissions.enabled</name>
<value>false</value> </property>
Edit Core-site.xml Modify access to Hdfs://albert-that is, nameservices
<property>
<name>fs.defaultFS</name>
<value>hdfs://albert</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/opt/hadoop_tmp</value>
</property>
<property>
<name>ha.zookeeper.quorum</name>
<value>node4:2181,node5:2181,node6:2181</value>
</property>
SCP./* root@node3:/opt/hadoop-2.5.1/etc/hadoop/
Configuring the HADOOP environment variable export hadoop_home=/opt/hadoop-2.5.1
Exportpath= $PATH: $HADOOP _home/bin: $HADOOP _home/sbin
Start Journalnode hadoop-daemon.sh start Journalnode Be sure to execute this command before the Namenode format
5. Format any of the Namenode nodes HDFs Namenode-format
6. Copy the/opt/hadoop_tmp on the currently formatted node to another host configured in Hdfs-site.xml scp-r./hadoop_tmp root@node2:/opt/
7. Initialize the HDFs Zkfc–formatzk on any of the namenode nodes of Ha
8. Start the HDFs command on the Namenode node that sets the password-free login start-dfs.sh
Start single-node hadoop-daemon.sh start Namenode
Start the entire cluster start-dfs.sh stop the entire cluster stop-all.sh
Second, Hadoop core components MapReduce distributed Offline Computing framework mobile computing without moving data
Mapreduce-> execution steps Split (split)->map (mapping is the program that you write, a thread map task that executes Java code)->shuffling (Shuffle: Sort, group, merge)->reduce (normalized/ Calculate reduce task execution by thread)
1, Configuration Yarn-site.xml
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<property>
<name>yarn.resourcemanager.ha.enabled</name>
<value>true</value>
</property>
<property>
<name>yarn.resourcemanager.cluster-id</name>
<value>yarnablert</value>
</property>
<property>
<name>yarn.resourcemanager.ha.rm-ids</name>
<value>rm1,rm2</value>
</property>
<property>
<name>yarn.resourcemanager.hostname.rm1</name>
<value>node5</value>
</property>
<property>
<name>yarn.resourcemanager.hostname.rm2</name>
<value>node6</value>
</property>
<property>
<name>yarn.resourcemanager.zk-address</name>
<value>node4:2181,node5:2181,node6:2181</value>
</property>
Configure Mapred-site.xml
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
Starting yarn requires a separate boot resourcemanage startup command
start-yarn.sh
Start another yarn-daemon.sh start ResourceManager