Hadoop tutorial (iii) ---- Hadoop Installation
This section describes how to install Hadoop. Before that, configure SSH password-free login. Why do I need to configure this? We all know that a Hadoop cluster may have dozens or even thousands of machines, and each time Hadoop is started, a password is required to log on to the DataNode of each machine, to avoid complicated operations in the future, SSH password-free logon is usually configured.
Note: The remote connection tool used by the author is XShell, which is a very useful remote connection tool. We recommend that you use it. You can also install the xftp File Transfer tool, it is convenient to copy the software on your computer to a virtual machine. xftp and Xshell can be used together.
To configure SSH password-free logon, you must first have SSH support. Of course, you will install SSH on your own when installing the CentOS System in the first article. To save time, skip this step. It is unclear whether SSH is installed or not. You can use ssh-version for verification. If similar information appears, it indicates that SSH has been installed.
Next let's take a look at how to configure SSH password-free login.
Enter ssh localhost to verify that the local machine cannot be connected through ssh before configuration.
The following is in the user directory (the author uses the root user, so it is the/root directory. The folder of common users is in/home, And the directory is the same as the user name) ls-, you can see a hidden folder. ssh. If not, you can create it on your own. Enter the command, as shown in:
ssh-keygen -t dsa -P '' -f /root/.ssh/id_dsa
Here, we will explain the meaning of the command (case sensitive): ssh-keygen indicates the generated key;-t indicates the type of the generated key;-P provides the secret language;-f specifies the generated file. after the command is executed. two files are generated in the ssh folder, id_dsa and id_dsa.pub. These are SSH private keys and public keys, like keys and locks. Next, append id_dsa.pub to the authorization key and enter the following command:
cat /root/.ssh/id_dsa.pub >> /root/.ssh/authorized_keys
In this case, the password-free login to the local machine is configured. Enter ssh localhost again for verification. The information shown indicates that the configuration is successful.
ssh localhost
As shown in, the First Login will ask us if we want to continue the connection, and enter yes. The second login will not need to ask for further access.
The above is just a local ssh Login, so how can we make the other three virtual machines accessible without a password? The answer is simple. Just enter the command to copy the SSH public key of the local machine to the other three virtual machines and enter the password of the corresponding virtual machine.
Ssh-copy-id-I/root /. ssh/id_dsa.pub root@hadoop.slave1 # prompt input hadoop. slave1 password ssh-copy-id-I/root /. ssh/id_dsa.pub root@hadoop.slave2 # prompt input hadoop. slave2 password ssh-copy-id-I/root /. ssh/id_dsa.pub root@hadoop.slave3 # prompt input hadoop. slave3 Password
Verify it and go to hadoop. slave1, enter ssh hadoop. master, and you will be asked whether to connect. After you enter yes, you will be asked to enter hadoop. master password, and then enter ssh hadoop again. the master can log on without a password, and the remaining two VMS can repeat the above steps. In this way, slave1, slave2, and slave3 can log on to the master without a password. However, the master cannot log on to slave1, slave2, and slave3 without a password, and you can repeat the above steps to go to the other three virtual machines.
After the configuration is complete, let's start learning how to install Hadoop.
Hadoop Installation
1. Download The Hadoop installation package. I learned to use Hadoop1.2.1. Please refer to the following link for more information.
2. Create/usr/local directory, enter this directory, download the installation package, decompress it, decompress the folder that appears a hadoop-1.2.1, modify the directory name hadoop, enter the folder, directory structure as shown in
# Enter/usr/localcd/usr/local # download hadoop installation package wget http://apache.fayea.com/hadoop/common/hadoop-1.2.1/hadoop-1.2.1.tar.gz
# Wait until the download is completed .....
# Decompress the downloaded installation package (after decompression, the installation package can be deleted, but it is recommended to back up the package to another directory)
Tar-zxvf hadoop-1.2.1.tar.gz
Music hadoop-1.2.1 hadoop
Cd hadoop
# View Structure
Ll
3. next, configure the environment variables and create a new hadoop directory under the/etc directory. Later, put the hadoop-related configuration files under this directory and directly use the configuration files under this directory, edit the/etc/profile file, append the following configuration and save it, and enter source/etc/profile to make the configuration take effect immediately:
# Set hadoop environmentexport HADOOP_HOME =/usr/local/hadoop export PATH = $ HADOOP_HOME/bin: $ PATH
# Save the modification and execute
Source/etc/profile
4. How can I check whether the installation is successful? Now in standalone mode, directly go to the/usr/local/hadoop/bin directory to execute the start-all.sh command, the process will ask whether to connect, directly enter yes
cd /usr/local/hadoop/bin
./start-all.sh
5. Run the jps command to check whether the hadoop process is successfully started, as shown in:
6. Because it is in standalone mode, NameNode and JobTracker are not started. Now, use hadoop fs-ls to check whether the installation is successful:
hadoop fs -ls
As shown in, the directory structure of the current directory is displayed. The installation is successful. Repeat the preceding steps to install the other three VMS !!
As of the preceding steps, Hadoop installation is complete. In the next article, we will talk about how to configure hadoop clusters! Coming soon!
See for all series: