---on Friday, November 6, 2015
- Preparatory work
- Hardware and Software Environment
- Host operating system: Processor: i5, frequency: 3.2G, Memory: 8G,WINDOWS64
- Virtual machine software: VMware Workstation 10
- Virtual operating system: CentOs-6.5 64-bit
- JDK:1.8.0_65 64-bit
- hadoop:1.2.1
- Cluster network environment
The cluster consists of 3 nodes, 1 Namenode, and 2 datanode, where nodes can ping each other. The node IP address and host name are as follows:
?
ordinal |
IP address |
Machine name |
type |
User name |
1? |
192.168.1.127 |
Master.hadoop? |
Namenode |
Hadoop? |
2? |
192.168.1.128 |
Slave1.hadoop |
Datanode |
Hadoop? |
3? |
192.168.1.129 |
Slave2.hadoop |
Datanode |
Hadoop? |
?
All nodes are CentOS systems, firewalls are disabled, a Hadoop user is created on all nodes, and the user home directory is/home/hadoop. A directory/usr/hadoop is created on all nodes, and the owner is a Hadoop user. Because the directory is used to install Hadoop, the user must have RWX permissions on it. (The general practice is to create a Hadoop directory under/usr under root, and modify the directory owner to be Hadoop (chown–r Hadoop:/usr/hadoop), or you may have insufficient permissions to distribute Hadoop files via SSH to other machines.)
- Attention
Since Hadoop requires the same deployment directory structure for Hadoop on all machines (because other task nodes are started at startup in the same directory as the primary node), there is an identical user name account. Referring to various documents, it is said that all machines are built with a Hadoop user, using this account to achieve no password authentication. For convenience, a Hadoop user is re-established on three machines.
- Environment construction
- Operating system Installation
For datanode types of systems, you can first install a system and then clone multiple identical systems using VMware's cloning capabilities. As shown in.
?
?
???? Tutorials for installing the CentOS system under VMware are available online. In particular, the network selection for all systems is bridge mode, and since the machine is online on the wireless network, it should also set the VMNET0 information: in the Editor--Virtual network editor ... Such as:
Bridge network refers to the local physical network card and virtual network card through the VMnet0 Virtual Switch Bridge, physical network card and virtual network card in the topology map on the same status, then the physical network card and virtual network card is equivalent to the same network segment, Virtual switch is equivalent to a real network switch, Therefore, the IP address of the two network card should also be set to the same network segment.
???? In the "Bridge to" column, select the type of network card used.
- Local Environment configuration
- Network configuration
The use of bridging to connect the network (suitable for routing, switch users), configure static IP to achieve Internet access, intra-LAN communication.
Vim/etc/sysconfig/network-scripts/ifcfg-eth0
?
device=eth0?? # describe the device alias for the NIC ? ? Bootproto=static # set the network card to get the IP address of the way, for the static ?? ? ? Hwaddr= "00:23:54:de:01:69"?? ? ? onboot= "yes" # is this network interface set to Yes when the system starts ? Type= "Ethernet"?? ? ? Userctl=no?? ? ? Ipv6init=no?? ? ? Peerdns=yes?? ? ? netmask=255.255.255.0?? # Network mask corresponding to the NIC ? ? ipaddr=192.168.1.127?? # This field is required only if the NIC is set to static ? ? gateway=192.168.1.1?? # Set as router address, this is usually the ? ? dns1=202.112.17.33? # set up for this network, or 8.8.8.8 #google domain Name server |
?
The configuration diagram for this operation is as follows:
?
?
(Can be set directly on the master machine, and then through the SCP command to pass the file to all the slave, and then modify the corresponding ipaddr in the slave, others unchanged)
Note: It is important to note that there is one thing to change, because the previous hardware address of the SCP has to be modified under the following:
First: Modify Vim/etc//etc/udev/rules.d/70-persistent-net.rules
Delete the NIC with the name Eth0. Also modify the eth1 NIC name to Eth0
Second: Modify Vim/etc/sysconfig/network-scripts/ifcfg-eth0
Modify the HWADDR to the address of the eth1 you just saw.
?
- Configuring the Hosts File
The "/etc/hosts" file is used to configure the DNS server information that the host will use, which is the corresponding [HostName IP] for each host that is recorded in the LAN. When the user is in the network connection, first look for the file, look for the corresponding host name corresponding IP address.
In a Hadoop cluster configuration, the IP and hostname of all machines in the cluster need to be added to the "/etc/hosts" file, so that master and all slave machines can communicate not only through IP, but also through host names.
So in the "/etc/hosts" file on all the machines, add the following:
192.168.1.127 Master.hadoop
192.168.1.128 Slave1.hadoop
192.168.1.129 Slave2.hadoop
?
(Again, you can set it directly on the master machine, and then pass the file to all slave via the SCP command)
- Operating system settings
The firewall and SELinux need to be turned off during Hadoop installation, or an exception will occur.
- Shutting down the firewall
- Service iptables Status View the firewall status as shown below to indicate that Iptables is turned on:
- Turn off firewall: Chkconfig iptables off
- Turn off SELinux
- Use the Getenforce command to see if closing
- Modify the/etc/selinux/config file
?
Note: You must restart the system after the modification to be effective.
- SSH No password Authentication configuration
Preparatory work:
1. Set up a Hadoop user on three virtual machines:
AddUser Hadoop #在root用户下
passwd Hadoop #输入两次密码
2. Create an. ssh folder under Hadoop users
mkdir ~/.ssh
?
The remote Hadoop daemon needs to be managed during Hadoop operation, and after Hadoop is started, Namenode starts and stops various daemons on each datanode through SSH (Secure Shell). This must be executed between the nodes when the command is not required to enter the form of a password, we need to configure SSH to use the form of non-password public key authentication, so that namenode use SSH without password login and start the dataname process, the same principle, Datanode can also log on to NameNode using SSH without a password.
SSH guarantees security because it uses public-key cryptography. The process is as follows:
(1) The remote host receives the user's login request and sends its own public key to the user.
(2) The user uses this public key to encrypt the login password and send it back.
(3) The remote host with its own private key, decrypt the login password, if the password is correct, consent to user login.
?
Note: If your Linux does not have SSH installed, please install SSH first.
?
- Configure master No password login all salve
?
- Execute the following command on the master node:
Ssh-keygen–t Rsa–p '
?
?
When asked to save the path after running, the direct carriage return takes the default path. The generated key pair: Id_rsa (private key) and id_rsa.pub (public key), which are stored by default in the/home/user name/.ssh directory.
?
View "/home/ User name /" under whether there is ". SSH" folder, and ". SSH" There are two newly-produced no-password key pairs under the file.
?
- Then make the following configuration on the master node and append the id_rsa.pub to the authorized key.
Cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
?
?
To view the ". SSH" folder:
?
View the permissions under Authorized_keys. (Very important!) )
If the permissions are not correct, use the following command to set permissions for the file:
?
chmod ~/.ssh #注意: These two privilege settings are particularly important to determine success or failure.
chmod ~/.ssh/authorized_keys
?
On the master machine, enter: SSH localhost command test to see if you can login without password.
?
- Send public key to Slave
In master, the public key id_rsa.pub is sent through the SCP command to the same place in each slave (that is, the/home/hadoop/.ssh folder), and the permissions are set (very important)
SCP ~/.ssh/authorized_keys [Email protected]:~/.ssh/
SCP ~/.ssh/authorized_keys [Email protected]:~/.ssh/
?
Set permissions in slave (root user settings):
Chown–r Hadoop:hadoop/home/hadoop/.ssh
Chmod–r 700/home/hadoop/.ssh
chmod 600/home/hadoop/.ssh/authorized_keys
- Test
Under Master, enter:
SSH Slave1.hadoop
If you don't use a password, it means success!
?
Focus: Set permissions!!!
- Software Installation and Environment configuration
The following software installation is first installed on master, all installed, and then by copying to slave.
- Java installation and its environment configuration
The JDK is installed on all machines, and the version is the same. Install the master server now and then pass the installed files to slave. Installing the JDK and configuring the environment variables needs to be done as "root".
- First log in as root "Master.hadoop" after "/usr" to create the "Java" folder, and then "jdk-8u25-linux-x64. GZ "Into the"/usr/java "folder and unzip it.
TAR-ZXVF jdk-8u25-linux-x64. Gz
?
Looking at "/usr/java" below you will find a folder called "Jdk1.8.0_65", stating that our JDK installation is complete, remove the installation package, and go to the next "Configure Environment variables" link.
- Configure environment variables. Edit the "/etc/profile" file and add the following JAVA "Java_home", "CLASSPATH" and "PATH" as follows:
# Set Java environment
Export java_home=/usr/java/jdk1.8.0_65/
Export JRE_HOME=/USR/JAVA/JDK1.8.0_65/JRE
Export classpath=.: $JAVA _home/lib: $JRE _home/lib: $CLASSPATH
Export path= $JAVA _home/bin: $PATH
As shown in the following:
?
Save and exit, and execute the following command to make its configuration take effect immediately.
Source/etc/profile or. /etc/profile
Key notes: The path variable first places the $java_home in the first place, so that the newly installed JDK will be the first choice, otherwise the system will be selected as the original JDK.
- Verify
Java–version
- Hadoop installation and its environment configuration
- First, log on to the "Master.hadoop" machine with the root user and copy the downloaded "hadoop-1.2.1.tar.gz" to the/usr directory. Then go to the "/usr" directory, use the following command to extract "hadoop-1.1.2.tar.gz", and rename it to "Hadoop", the folder read access to the normal user Hadoop, and then delete "hadoop-1.2.1.tar.gz" Install the package.
Cd/usr
TAR–XZVF hadoop-1.2.1.tar.gz
MV hadoop-1.2.1 Hadoop
Chown–r Hadoop:hadoop Hadoop
RM-RF hadoop-1.2.1.tar.gz
?
- Add the installation path of Hadoop to "/etc/profile".
# Set Hadoop path
Export Hadoop_home=/usr/hadoop
Export path= $PATH: $HADOOP _home/bin
?
- Configure the hadoop-env.sh and confirm that it takes effect
The "hadoop-env.sh" file is located under the "/usr/hadoop/conf" directory.
Modify the following in the file:
Export java_home=/usr/java/jdk1.8.0_65/
(The java_home here is the same as in the previous Java environment)
?
SOURCE hadoop-env.sh
Hadoop version
?
?
- Create subdirectories under the/usr/hadoop directory
Cd/usr/hadoop
mkdir tmp
mkdir HDFs
?
- Configuring the Core-site.xml File
Modify the Hadoop core profile Core-site.xml, which is configured with the address and port number of HDFs master (that is, Namenode).
<configuration>
<property>
<name>hadoop.tmp.dir</name>
<value>/usr/hadoop/tmp</value>
(Note: Please create the TMP folder in the/usr/hadoop directory first)
<description>a base for other temporary directories.</description>
</property>
<property>
<name>fs.default.name</name>
<value>hdfs://192.168.1.127:9000</value>
</property>
</configuration>
?
- Configuring the Hdfs-site.xml File
<configuration>
<property>
<name>dfs.replication</name>
<value>2</value>
</property>
<property>
<name>dfs.name.dir</name>
<value>/usr/local/hadoop/hdfs/name</value>
</property>
<property>
<name>dfs.data.dir</name>
<value>/usr/local/hadoop/hdfs/data</value>
</property>
</configuration>
?
?
- Configuring the Mapred-site.xml File
Modify the configuration file for MapReduce in Hadoop, configured with the address and port of Jobtracker.
?
<configuration>
<property>
<name>mapred.job.tracker</name>
<value>http://192.168.1.127:9001</value>
</property>
</configuration>
?
?
- Configuring the Masters File
There are two types of scenarios: D
?
(1) The first kind
Modify localhost to Master.hadoop
(2) The second type
Remove "localhost" and join the master machine's ip:192.168.1.127
For the sake of insurance, enable the second, because if you forget to configure the "/etc/hosts" LAN DNS failure, so there will be unexpected errors, but once the IP pairing, network unblocked, you can find the corresponding host through IP.
???? Vim/usr/hadoop/conf/masters
?
?
- Configuring the Slaves file (master host specific)
Similar to configuring Masters files:
Vim/usr/hadoop/conf/slaves
?
- File installation and configuration on the slave server
After master has finished loading the JDK and Hadoop and configured the environment, proceed to the next step:
- Copy the/usr/java folder to a different slave
?
In Master, enter:
???? Scp–r/usr/java [email protected]:/usr/
Scp–r/usr/java [email protected]:/usr/
?
- Copy/etc/profile to other slave
?
In Master, enter:
scp/etc/profile [email protected]:/etc/
scp/etc/profile [email protected]:/etc/
?
- Copy the/usr/hadoop folder to a different slave
?
In Master, enter:
Scp–r/usr/hadoop [email protected]:/usr/
Scp–r/usr/hadoop [email protected]:/usr/
?
- Change permissions
Change/usr/java,/usr/hadoop user group to Hadoop user, set permission to 755
- Start-up and verification
- Start
- Format HDFs File system
Use Hadoop for ordinary users on "Master.hadoop". (Note: Only one time, the next boot no longer need to format, just start-all.sh)
Hadoop Namenode–format
?
- Start Hadoop
Shut down the firewall of all the machines in the cluster before starting up, or the Datanode will turn off automatically. Start with the following command.
start-all.sh
?
- Turn off Hadoop
stop-all.sh
- Validation test
- Test with the "JPS" command
On master, use the Java-brought gadget JPS to view the process.
View the process on Slave1 with a Java-brought gadget JPS.
Note: The top two pictures show success!
?
- View cluster status with "Hadoop Dfsadmin-report"
?
- Viewing a cluster from a Web page
Visit jobtracker:http://192.168.1.127:50030
?
Visit namenode:http://192.168.1.127:50070
- The problems encountered and the solving methods
- About Warning: $HADOOP _home is deprecated.
This warning is always prompted when Hadoop is typed into Hadoop after the installation is complete:
Warning: $HADOOP _home is deprecated.
Solution One: Edit the "/etc/profile" file, remove the hadoop_home variable settings, reenter the Hadoop FS command, and the warning disappears.
Solution Two: Edit the "/etc/profile" file, add an environment variable, and then the warning disappears:
Export Hadoop_home_warn_suppress=1
- SSH Settings not successful
Most likely because of the wrong permissions settings!
- Datanode not connected.
It is possible that the master and slave firewalls are not turned off.
Hadoop fully Distributed Build