Detailed installation process of Hadoop2.3.0 in CentOS 6.4
Preface:
Hadoop implements a Distributed File System (HDFS. HDFS features high fault tolerance and is designed to be deployed on low-cost hardware. It also provides high throughput to access application data, suitable for applications with large data sets. HDFS relaxed (relax) POSIX requirements and allows you to access data in a streaming access File System as a stream.
The core design of the Hadoop framework is: HDFS and MapReduce. HDFS provide storage for massive data, while MapReduce provides computing for massive data.
-------------------------------------- Split line --------------------------------------
Install Hadoop 0.20.2 in CentOS 6.4
Build a Hadoop environment on Ubuntu 13.04
Cluster configuration for Ubuntu 12.10 + Hadoop 1.2.1
Build a Hadoop environment on Ubuntu (standalone mode + pseudo Distribution Mode)
Configuration of Hadoop environment in Ubuntu
Detailed tutorial on creating a Hadoop environment for standalone Edition
-------------------------------------- Split line --------------------------------------
1. System Architecture
Cluster role:
Host Name IP address role
Name01 192.168.52.128 NameNode, ResourceManager (JobTracker)
Data01 192.168.52.129 NameNode, ResourceManager (JobTracker)
Data02 192.168.52.130 DataNode and NodeManager (TaskTracker)
System Environment:
Centos6.5 x64 vmware vpc
Hard Disk: 30 GB
Memory: 1 GB
Hadoop version: hadoop-2.3.0
2. Prepare the environment
2.1 system settings
Disable iptables:
/Sbin/service iptables stop
/Sbin/chkconfig iptables off
Disable selinux: setenforce 0
Sed "s @ ^ SELINUX = enforcing @ SELINUX = disabled @ g"/etc/sysconfig/selinux
Set the node name. Run the following command for all nodes:
/Bin/cat <EOF>/etc/hosts
Localhost. localdomain = data01 # Or name01, data02
192.168.52.128 name01
192.168.52.129 data01
192.168.52.130 data02
EOF
Hostname node0 *
Send "s @ HOSTNAME = localhost. localdomain @ HOSTNAME = node0 * @ g"/etc/sysconfig/network
2.2 create a user directory
Create a hadoop Running Account:
After logging on to all machines with root, all machines create hadoop users.
Useradd hadoop # Set the hadoop User Group
Passwd hadoop
# Sudo useradd-s/bin/bash-d/home/hadoop-m hadoop-g hadoop-G admin // Add a zhm user, which belongs to the hadoop user group, and has the admin permission.
# Su hadoop // switch to zhm user
Create a hadoop directory:
Define the path for storing data and directories, and define the path for storing code and tools.
Mkdir-p/home/hadoop/src
Mkdir-p/home/hadoop/tools
Chown-R hadoop. hadoop/home/hadoop /*
Define the path where data nodes are stored to the hadoop folder under the directory. There is enough space to store the directory where data nodes are stored.
Mkdir-p/data/hadoop/hdfs
Mkdir-p/data/hadoop/tmp
Mkdir-p/var/logs/hadoop
Set write permission
Chmod-R 777/data/hadoop
Chown-R hadoop. hadoop/data/hadoop /*
Chown-R hadoop. hadoop/var/logs/hadoop
Define java installer path
Mkdir-p/usr/lib/jvm/
2.3 configure ssh password-free Login
Reference Article address: http://blog.csdn.net/ab198604/article/details/8250461
SSH uses the RSA algorithm to generate public and private keys. During data transmission, data is encrypted to ensure
Data security and reliability. The public key is a public part and can be accessed by any node on the network. The private key is mainly used to encrypt data to prevent others from stealing data. All in all, this is an asymmetric algorithm,
It is still very difficult to crack. Data access is required between nodes in the Hadoop cluster. The accessed nodes must verify the reliability of accessing user nodes. hadoop uses ssh
Remote secure login through key verification and data encryption and decryption. Of course, if hadoop needs to verify access to each node, its efficiency will be greatly reduced, so you need to configure SSH-free
The password method is used to connect to the accessed node remotely, which greatly improves the access efficiency.
Namenode node configuration login to other nodes without a password, each node must generate a public key password, Id_dsa.pub is the public key, id_dsa is the private key, and then copy the public key file to the authorized_keys file, this step is required. The process is as follows:
2.3.1 each node generates a separate key
# Tip:
(1): The. ssh directory requires 755 permissions, and the authorized_keys requires 644 permissions;
(2): the Linux firewall is on, and the port to be opened by hadoop needs to be added, or the Firewall should be disabled;
(3): The reason why the data node cannot connect to the master server may be because the machine name is used, or the IP address is relatively secure.
Name01 (192.168.52.128) master database above:
Namenode master node hadoop Account creation Server login public/private key:
Mkdir-p/home/hadoop/. ssh
Chown hadoop. hadoop-R/home/hadoop/. ssh
Chmod 755/home/hadoop/. ssh
Su-hadoop
Cd/home/hadoop/. ssh
Ssh-keygen-t dsa-p'-f id_dsa
[Hadoop @ name01. ssh] $ ssh-keygen-t dsa-p'-f id_dsa
Generating public/private dsa key pair.
Open id_dsa failed: Permission denied.
Saving the key failed: id_dsa.
[Hadoop @ name01. ssh] $
Error reported. Solution: setenforce 0
[Root @ name01. ssh] # setenforce 0
Su-hadoop
[Hadoop @ name01. ssh] $ ssh-keygen-t dsa-p'-f id_dsa
Generating public/private dsa key pair.
Your identification has been saved in id_dsa.
Your public key has been saved in id_dsa.pub.
The key fingerprint is:
52: 69: 9a: ff: 07: f4: fc: 28: 1e: 48: 18: fe: 93: ca: ff: 1d hadoop @ name01
The key's randomart image is:
+ -- [DSA 1024] ---- +
|
|. |
|. + |
|. B. |
| * S. o |
| = O. o |
| *. Eo |
|. O. oo... |
| O. o + o. |
+ ----------------- +
[Hadoop @ name01. ssh] $ ll
Total 12
-Rw -------. 1 hadoop 668 Aug 20 id_dsa
-Rw-r --. 1 hadoop 603 Aug 20 id_dsa.pub
Drwxrwxr-x. 2 hadoop 4096 Aug 20 touch
[Hadoop @ name01. ssh] $
Id_dsa.pub is the public key, id_dsa is the private key, and then copy the public key file to the authorized_keys file. This step is required and the process is as follows:
[Hadoop @ name01. ssh] $ cat id_dsa.pub> authorized_keys
[Hadoop @ name01. ssh] $ ll
Total 16
-Rw-r --. 1 hadoop 603 Aug 21 authorized_keys
-Rw -------. 1 hadoop 668 Aug 20 id_dsa
-Rw-r --. 1 hadoop 603 Aug 20 id_dsa.pub
Drwxrwxr-x. 2 hadoop 4096 Aug 20 touch
[Hadoop @ name01. ssh] $
Use the same method as above in the remaining two nodes.
Data01 (192.168.52.129)
2.3.2 run the following command on data01 (192.168.52.129:
Useradd hadoop # Set the hadoop User Group
Passwd hadoop # Set the hadoop password to hadoop
Setenforce 0
Su-hadoop
Mkdir-p/home/hadoop/. ssh
Cd/home/hadoop/. ssh
Ssh-keygen-t dsa-p'-f id_dsa
Cat id_dsa.pub> authorized_keys
2.3.3 run the following command on data01 (192.168.52.130:
Useradd hadoop # Set the hadoop User Group
Passwd hadoop # Set the hadoop password to hadoop
Setenforce 0
Su-hadoop
Mkdir-p/home/hadoop/. ssh
Cd/home/hadoop/. ssh
Ssh-keygen-t dsa-p'-f id_dsa
Cat id_dsa.pub> authorized_keys
2.3.4 construct three general authorized_keys
Operate on name01 (192.168.52.128:
Su-hadoop
Cd/home/hadoop/. ssh
Scp hadoop @ data01:/home/hadoop/. ssh/id_dsa.pub./id_dsa.pub.data01
Scp hadoop @ data02:/home/hadoop/. ssh/id_dsa.pub./id_dsa.pub.data02
Cat id_dsa.pub.data01> authorized_keys
Cat id_dsa.pub.data02> authorized_keys
As follows:
[Hadoop @ name01. ssh] $ scp hadoop @ data01:/home/hadoop/. ssh/id_dsa.pub./id_dsa.pub.data01
The authenticity of host 'data01 (192.168.52.129) 'can't be established.
RSA key fingerprint is 5b: 22: 7b: dc: 0c: b8: bf: 5c: 92: aa: ff: 93: 3c: 59: bd: d3.
Are you sure you want to continue connecting (yes/no )? Yes
Warning: Permanently added 'data01, 192.168.52.129 '(RSA) to the list of known hosts.
Hadoop @ data01's password:
Permission denied, please try again.
Hadoop @ data01's password:
Id_dsa.pub 100% 603 0.6KB/s
[Hadoop @ name01. ssh] $
[Hadoop @ name01. ssh] $ scp hadoop @ data02:/home/hadoop/. ssh/id_dsa.pub./id_dsa.pub.data02
The authenticity of host 'data02 (192.168.52.130) 'can't be established.
RSA key fingerprint is 5b: 22: 7b: dc: 0c: b8: bf: 5c: 92: aa: ff: 93: 3c: 59: bd: d3.
Are you sure you want to continue connecting (yes/no )? Yes
Warning: Permanently added 'data02, 192.168.52.130 '(RSA) to the list of known hosts.
Hadoop @ data02's password:
Id_dsa.pub 100% 603 0.6KB/s
[Hadoop @ name01. ssh] $
[Hadoop @ name01. ssh] $ cat id_dsa.pub.data01> authorized_keys
[Hadoop @ name01. ssh] $ cat id_dsa.pub.data02> authorized_keys
[Hadoop @ name01. ssh] $ cat authorized_keys
Ssh-dss aaaab3nzac1kc3maaac4242jwedownffcpys/qB4OercYLY5o5XvBn8a5iy9K/clusters + cores/containers + AiCds + queues/queues = hadoop @ name01
Ssh-dss kernel/FI8wUskAGDpnuqer + 5 XvbDFZgbkVlI/kernel + kernel/ozLX73/kernel + eTbLU76xS + ydilwbeotherwise/kernel + kernel/kernel = hadoop @ data01
Ssh-dss secure + hM/secure + rG/VS8QWcwmGcFZoR + secure/2qdGO1Ipiw/cXN2TyfHrnMcDr3 + aEf7cUGHfWhwW4 + secure/secure + + available/secure 0CIn4Bg3pp4ZZES435R40F + jlrsnbLaXI + ixCzpqw = hadoop @ data02
[Hadoop @ name01. ssh] $
The authorized_keys file contains three lines of records, representing the public keys used to access name01, data01, and data02 respectively. Copy the authorized_keys public key file to the same directory on data01 and data02.
Then, you can remotely connect name01, data01, and data02 through hadoop to avoid password.
Scp authorized_keys hadoop @ data01:/home/hadoop/. ssh/
Scp authorized_keys hadoop @ data02:/home/hadoop/. ssh/
Then, grant permissions to hadoop users on name01, data01, and data02 respectively.
Su-hadoop
Chmod 600/home/hadoop/. ssh/authorized_keys
Chmod 700-R/home/hadoop/. ssh
Test ssh key-free login. During the first connection, you need to enter yes, and then you do not need to enter a password to directly access ssh.
[Hadoop @ name01. ssh] $ ssh hadoop @ data01
Last login: Thu Aug 21 01:53:24 2014 from name01
[Hadoop @ data01 ~] $ Ssh hadoop @ data02
The authenticity of host 'data02 (192.168.52.130) 'can't be established.
RSA key fingerprint is 5b: 22: 7b: dc: 0c: b8: bf: 5c: 92: aa: ff: 93: 3c: 59: bd: d3.
Are you sure you want to continue connecting (yes/no )? Yes
Warning: Permanently added 'data02, 192.168.52.130 '(RSA) to the list of known hosts.
[Hadoop @ data02 ~] $ Ssh hadoop @ name01
The authenticity of host 'name01 (: 1) 'can't be established.
RSA key fingerprint is 5b: 22: 7b: dc: 0c: b8: bf: 5c: 92: aa: ff: 93: 3c: 59: bd: d3.
Are you sure you want to continue connecting (yes/no )? Yes
Warning: Permanently added 'name01' (RSA) to the list of known hosts.
Last login: Thu Aug 21 01:56:12 2014 from data01
[Hadoop @ data02 ~] $ Ssh hadoop @ name01
Last login: Thu Aug 21 01:56:22 2014 from localhost. localdomain
[Hadoop @ data02 ~] $
I can see the problem. ssh from data01 and data02 to name01 was not successful. Where is the problem?
For more details, please continue to read the highlights on the next page: