Construction and management of Hadoop environment on CentOS
Please load the attachment
Date of compilation: September 1, 2015
Experimental requirements:
Complete the Hadoop platform installation deployment, test the Hadoop platform capabilities and performance, record the experiment process, and submit the lab report.
1) Mastering the Hadoop installation process
2) Understand how Hadoop works
3) Testing the scalability of the Hadoop system
4) Testing the stability of the Hadoop system
First, prerequisites
Ensure that all required software is installed on each node in the cluster: JDK, Ssh,hadoop (2.6.0).
1) JDK, must be installed (version 1.7 or above), it is recommended to choose the Java version released by Sun Company.
2) SSH must be installed and guaranteed to run sshd to manage the remote Hadoop daemon with Hadoop scripts.
Second, the installation and configuration of Hadoop
When HDFs starts the DFS and yarn services on the master node, it needs to start the Slave node service automatically, and HDFs needs to access the slave node machine via SSH. HDFs needs to build multiple servers to form a distributed system, which requires no password access between nodes. This section of the task is to set up SSH, user creation, Hadoop parameter settings, the completion of the HDFS distributed environment to build.
Task implementation:
This task requires four node units to be clustered, each node machine installed centos-6.5-x86_64 system. The IP addresses used by the four nodes are: 192.168.23.111, 192.168.23.112, 192.168.23.113, 192.168.23.114, corresponding node host name: Node1, Node2, Node3, Node4. Node machine Node1 as Namenode, other as Datanode.
On the Node1 host,
To edit vi/etc/hosts, add the following:
192.168.23.111 Node1
192.168.23.112 Node2
192.168.23.113 Node3
192.168.23.114 Node4
Edit Vi/etc/sysconfig/network, modify
Hostname=node1
Shutting down the firewall
Chkconfig iptables off
Service Iptables Stop
Similar operations are performed on other node hosts, but the value of hostname needs to be modified to the corresponding host name, respectively.
Step 1
Create Hadoop users, create user hadoop,uid=660 on four nodes, respectively, h1111, h2222, h3333, h4444. Log on to the Node1 node machine to create a Hadoop user and set a password. The operation commands are as follows.
[Email protected] ~]# useradd-u 660 Hadoop
[Email protected] ~]# passwd Hadoop
The other node machines operate the same.
Step 2
Set master node machine SSH login slave node machine without password.
(1) on the Node1 node machine, log in as user Hadoop user or use Su–hadoop to switch to Hadoop users. The operation commands are as follows.
[Email protected] ~]# Su-hadoop
(2) using Ssh-keygen to generate the certificate key, the Operation command is as follows.
[[Email protected] ~] $ssh-keygen-t DSA
(3) using Ssh-copy-id to copy the certificate public key to the NODE1,NODE2,NODE3,NODE4 node machine, the Operation command is as follows.
[Email protected] ~]$ CD ~
[Email protected] ~]$ ssh-copy-id-i. Ssh/id_dsa.pub Node1
[Email protected] ~]$ ssh-copy-id-i ssh/id_dsa.pub node2
[Email protected] ~]$ ssh-copy-id-i ssh/id_dsa.pub node3
[Email protected] ~]$ ssh-copy-id-i ssh/id_dsa.pub node4
(4) On Node1 node machine using SSH test without password login node1 node machine, the Operation command is as follows.
[[email protected] ~]$ ssh Node1
Last Login:mon Dec 08:42:38 from Node1
[[Email protected] ~]$ exit
Logout
Connection to Node1 closed.
The above indicates a successful operation.
Continue using SSH to test the Node2, Node3, and Node4 node machines on the Node1 node machine with the following command.
[[email protected] ~]$ ssh Node2
[[email protected] ~]$ ssh node3
[[email protected] ~]$ ssh node4
After the test logs on each node machine, enter Exit to exit.
Step 3
Upload or download the hadoop-2.6.0.tar.gz package to the root directory of the Node1 node machine. If the Hadoop package is compiled on the Node1 node machine, copy the compiled package to the root directory. First find the address of the required package http://mirror.bit.edu.cn/apache/hadoop/common/,.
Then use the wget command or other commands to download the required packages, such as the following:
wget http://mirror.bit.edu.cn/apache/hadoop/common/hadoop-2.6.0/hadoop-2.6.0.tar.gz
Step 4
Unzip the file and install the file. The operation commands are as follows.
[[Email protected] ~]# CD
[Email protected] ~]# tar xvzf hadoop-2.6.0.tar.gz
[Email protected] ~]# CD hadoop-2.6.0
[Email protected] hadoop-2.6.0]# MV */home/hadoop/
Step 5
Modify the Hadoop configuration file. Hadoop configuration files include: hadoop-env.sh, yarn-env.sh, Slaves, Core-site.xml, Hdfs-site.xml, Mapred-site.xml, Yarn-site.xml. The configuration file is in the/home/hadoop/etc/hadoop/directory and can be configured in this directory. The operation commands are as follows.
[Email protected] hadoop-2.6.0]# cd/home/hadoop/etc/hadoop/
(1) Modify the hadoop-env.sh,
If you do not have Java installed, install Java first
Yum-y Install java-1.7.0-openjdk*
Installation problems can be referred to the following URL tutorial for processing.
Http://jingyan.baidu.com/article/4853e1e51d0c101909f72607.html
Check whether the/etc/profile of each host has java_home variables, not at the end of the add:
java_home=/usr/lib/jvm/java-1.7.0
Export Java_home
Path= $JAVA _home/bin: $PATH
Classpath=.: $JAVA _home/lib/dt.jar: $JAVA _home/lib/tools.jar
Export PATH
Export CLASSPATH
Export hadoop_home=/home/hadoop/
Export path= $PATH: $HADOOP _home/bin: $HADOOP _home/sbin
Save after exiting execute Source/etc/profile
Change the Export java_home=${java_home} in the hadoop-env.sh file to
Export java_home=/usr/lib/jvm/java-1.7.0
(2) Modify the slaves, the file registers the Datanode node hostname, this place adds the NODE2,NODE3,NODE4 three node host name. as shown below.
[Email protected] hadoop]# VI Slaves
Node2
Node3
Node4
(3) Modify the Core-site.xml to change the <configuration></configuration> in the file to the following content.
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://node1:9000</value>
</property>
<property>
<name>io.file.buffer.size</name>
<value>131072</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>file:/home/hadoop/tmp</value>
<description>abase for other temporary directories.</description>
</property>
<property>
<name>hadoop.proxyuser.hadoop.hosts</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.hadoop.groups</name>
<value>*</value>
</property>
</configuration>
Where Node1 is the Namenode (Master) node machine for the cluster, Node1 can be represented with an IP address.
(4) Modify the Hdfs-site.xml to change the <configuration></configuration> in the file to the following content.
<configuration>
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>node1:9001</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/home/hadoop/dfs/name</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/home/hadoop/dfs/data</value>
</property>
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
<property>
<name>dfs.webhdfs.enabled</name>
<value>true</value>
</property>
</configuration>
In order to facilitate teaching, the second namenode also use Node1 node machine, Namenode generated data stored in the/home/hadoop/dfs/name directory, Datanode generated data stored in/home/hadoop/dfs/ Data directory, set up 3 copies of the backup.
(5) Rename the file Mapred-site.xml.template to Mapred-site.xml. The operation is as follows.
[Email protected] hadoop]# MV Mapred-site.xml.template mapred-site.xml
Modify the <configuration></configuration> in the file to the following content.
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapreduce.jobhistory.address</name>
<value>node1:10020</value>
</property>
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>node1:19888</value>
</property>
</configuration>
(6) Modify the Yarn-site.xml to change the <configuration></configuration> in the file to the following content.
<configuration>
<property>
<name>yarn.resourcemanager.hostname</name>
<value>192.168.23.111</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<property>
<name>yarn.resourcemanager.address</name>
<value>node1:8032</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>node1:8030</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>node1:8031</value>
</property>
<property>
<name>yarn.resourcemanager.admin.address</name>
<value>node1:8033</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address</name>
<value>node1:8088</value>
</property>
</configuration>
Step 6
Modify the user master/group properties of the "/home/hadoop/" file as follows.
[Email protected] hadoop]# chown-r Hadoop:hadoop/home/hadoop
Step 7
Copy the configured Hadoop system to the other node machines, as follows.
[Email protected] hadoop]# Cd/home/hadoop
[Email protected] hadoop]# scp-r hadoop-2.6.0 [email protected]:/home/hadoop
[Email protected] hadoop]# scp-r hadoop-2.6.0 [email protected]:/home/hadoop
[Email protected] hadoop]# scp-r hadoop-2.6.0 [email protected]:/home/hadoop
Step 8
Log in to the NODE2,NODE3,NODE4 node machine separately, modify the "/home/hadoop/" file user Master/group properties.
[Email protected]~]# chown-r Hadoop:hadoop/home/hadoop
[Email protected]~]# chown-r Hadoop:hadoop/home/hadoop
[Email protected]~]# chown-r Hadoop:hadoop/home/hadoop
This is where the entire Hadoop distributed system is built.
Third, the management of Hadoop
1. Format a new Distributed File system
First, format a new Distributed File system
$ cd/home/hadoop
$ bin/hadoop Namenode-format
System output in case of success:
/opt/hadoop/hadoopfs/name has been successfully formatted.
View output to ensure successful format of distributed File system
After execution, you can see the/home/hadoop/name directory on the master machine.
2. Start the Distributed File service
sbin/start-all.sh
Or
sbin/start-dfs.sh
sbin/start-yarn.sh
Use your browser to browse the master node machine http://192.168.23.111:50070, view the Namenode node status, and browse the Datanodes data node.
Use your browser to browse the master node machine http://192.168.23.111:8088 See all apps.
3. Close the Distributed File service
sbin/stop-all.sh
4. File Management
To create the SWVTC directory in HDFs, the Operation command is as follows.
[Email protected] ~]$ HDFs DFS-MKDIR/SWVTC #类似 MKDIR/SWVTC
In HDFs view the current directory, the action command is as follows.
[Email protected] ~]$ hdfs dfs-ls/#类似 ls/
Found 1 Items
Drwxr-xr-x-hadoop supergroup 0 2014-12-23 10:07/SWVTC
To edit the file Jie.txt in the local system, the Operation command is as follows.
[Email protected] ~]$ VI jie.txt
Add Content:
hi,hadoop!
Upload the file jie.txt to the/SWVTC directory in HDFs, and the Operation command is as follows.
[Email protected] ~]$ HDFs dfs-put JIE.TXT/SWVTC
Download the file from HDFs. Operation Command:
[Email protected] ~]$ HDFs dfs-get/swvtc/jie.txt
To view the contents of/swvtc/jie.txt in HDFs, manipulate the command:
[Email protected] ~]$ HDFs dfs-text/swvtc/jie.txt
hi,hadoop!
Hadoop Dfs-get in Getin get the file from HDFs and rename it to Getin, the same executable file as put can manipulate the directory
Hadoop DFS-RMR out deletes the specified file from HDFs
Hadoop dfs-cat in/* View contents of the in directory in HDFs
Hadoop dfsadmin-report View the basic statistics for HDFS with the following results
Hadoop dfsadmin-safemode leave exit Safe mode
Hadoop Dfsadmin-safemode Enter Safe mode
5. Adding nodes
Extensibility is an important feature of HDFs, first installing HADOOP on the newly added node, then modifying the $hadoop_home/conf/master file, adding the Namenode hostname, and then modifying the $hadoop_home/on the Namenode node Conf/slaves file, add the new node host name, and then establish a new node without password SSH connection.
To run the start command:
./start-all.sh
You can then access the newly added Datanode by http://(Masternode's host name). 50070
6. Load Balancing
To run the command:
./start-balancer.sh
You can make the selection policy on the Datanode node rebalance the distribution of data blocks on the Datanode.
Build and manage Hadoop environments on CentOS