Build a Hadoop distributed cluster (Environment: Linux virtual machine)
1. Preparations: (Plan the host name, ip address, and usage. Set up three hosts first, and add four hosts dynamically.
In the usage column, you can also set namenode, secondaryNamenode, and jobTracker
Separate deployment, depending on actual needs, not unique)
Host Name machine ip usage
Cloud01 192.168.1.101 namenode/secondaryNamenode/jobTracker
Cloud02 192.168.1.102 datanode/taskTracker
Cloud03 192.168.1.103 datanode/taskTracker
Cloud04 192.168.1.104 datanode/taskTracker
2. Configure the linux environment (refer to the pseudo-distributed architecture below)
2.1 modify the Host Name (cloud01, cloud02, cloud03)
2.2 modify the ip address of each machine (allocated by yourself)
2.3 modify the ing between host names and ip addresses
(Only modification on cloud01. After modification, copy the modification to another machine. The command is as follows:
Scp/etc/profile cloud02:/etc/
Scp/etc/profile cloud03:/etc /)
2.4 disable Firewall
2.5 restart
3. Install jdk (refer to the pseudo-distributed architecture. The jdk 1.6.0 _ 45 version is used as an example)
You only need to install it on one machine and then copy it to another machine (the software should be managed in a unified manner)
For example, on cloud01, jdk is installed under/soft/java.
(Instruction: scp-r/soft/java/cloud02:/soft/
Scp-r/soft/java/cloud03:/soft/
You can copy the jdk. But we will not copy it for the time being. After the following hadoop is installed, copy it together)
4. Install hadoop cluster (hadoop version in hadoop-1.1.2 as an example)
4.1 upload the hadoop compressed package to the/soft directory and decompress it to the directory (refer to the pseudo-distributed architecture)
4.2 configure hadoop (six files need to be configured this time)
4.21hadoop-env. sh
In row 9
Export JAVA_HOME =/soft/java/jdk1.6.0 _ 45 (remove the # above)
4.22core-site. xml
<! -- Specify the communication address of the HDFS namenode -->
<Property>
<Name> fs. default. name </name>
<Value> hdfs: // cloud01: 9000 </value>
</Property>
<! -- Specify the directory for storing files generated during hadoop running -->
<Property>
<Name> hadoop. tmp. dir </name>
<Value>/soft/hadoop-1.1.2/tmp </value>
</Property>
4.23hdfs-site. xml
<! -- Configure the number of HDFS replicas (as defined by the actual situation based on requirements, the default value is 3) -->
<Property>
<Name> dfs. replication </name>
<Value> 3 </value>
</Property>
4.24mapred-site. xml
<! -- Specify the jobtracker address -->
<Property>
<Name> mapred. job. tracker </name>
<Value> cloud01: 9001 </value>
</Property>
4.25 masters (specify the secondarynamenode address)
Cloud01
4.26 slaves (subnode)
Cloud02
Cloud03
4.3 copy the configured hadoop to the other two machines.
Copy the soft folder directly (including jdk and hadoop). Therefore, we strongly recommend that you
File Management)
Command:
Scp-r/soft/cloud02 :/
Scp-r/soft/cloud03 :/
4.4 configure ssh Login-free
Logon-free access from the master node to the sub-node
That is, login-free access from cloud01 to cloud02 and cloud03
Generate on cloud01
Command: ssh-keygen-t rsa
Then copy it to the other two machines.
Command: ssh-copy-id-I cloud02
Ssh-copy-id-I cloud03
4.5 format hadoop
You only need to format it on cloud01 (master node namenode ).
Command: hadoop namenode-format
4.6 Verification
Startup cluster command: start-all.sh
If an Exception related to safemode is reported during startup
Run the command: hadoop dfsadmin-safemode leave (Exit security mode)
Start hadoop again
Then, run the jps command to check the machines and see if they are the same as the intended use)
OK. If it is the same as planned, it will be done.
5. dynamically Add a node
(It is common and practical in the actual production process)
Cloud04 192.168.1.104 datanode/taskTracker
5.1 Add a linux instance through clone (taking clone cloud01 as an example. This is not the case in actual production,
Because virtual machines are rarely used in the actual production process, they are all direct servers. Note that when cloning,
The host to be cloned must be stopped first)
5.2 modify the host name, IP address, configure the ing file, disable the firewall, and then configure hadoop
Add cloud04 to the file slaves to set login-free and restart
(Clone, you do not need to configure the ing file or disable the firewall.
The machine you cloned has been configured)
5.3 after the machine is restarted, start datanode and taskTracker respectively.
Command: hadoop-daemon.sh start datanode
Hadoop-daemon.sh start tasktracker
5.4 run the command to refresh the node where the namenode is located on cloud01
Hadoop dfsadmin-refreshNodes
5.5 Verification
Http: // linux ip Address: 50070 (hdfs Management Interface)
Check whether there is one more node. If there is one more node, it will be done!
6. delete a node (collection here)
6.1 Modify/soft/hadoop-1.1.2/conf/hdfs-site.xml files on cloud01
Add configuration information:
<Property>
<Name> dfs. hosts. exclude </name>
<Value>/soft/hadoop-1.1.2/conf/excludes </value>
</Property>
6.2 confirm the machine to be dismounted
Dfs. hosts. exclude defines the file content as one line for each machine to be deprecated.
6.3 force reload Configuration
Command: hadoop dfsadmin-refreshNodes
6.4 close a node
Command: hadoop dfsadmin-report
You can view the nodes connected to the current cluster.
Executing Decommission will show:
Decommission Status: Decommission in progress
After the execution is completed, the following displays:
Decommission Status: Decommissioned
6.5 edit the excludes file again
Once the machine is dismounted, they can be removed from the excludes file.
Log on to the machine to be dismounted and you will find that the DataNode process is gone, but TaskTracker still exists,
You need to handle it manually
Build a Hadoop environment on Ubuntu 13.04
Cluster configuration for Ubuntu 12.10 + Hadoop 1.2.1
Build a Hadoop environment on Ubuntu (standalone mode + pseudo Distribution Mode)
Configuration of Hadoop environment in Ubuntu
Detailed tutorial on creating a Hadoop environment for standalone Edition
Build a Hadoop environment (using virtual machines to build two Ubuntu systems in a Winodws environment)