Software:
hadoop1.1.2, Pietty ( telnet software that logs on to a Linux virtual machine under Windows system ), WINSCP ( a secure copy of files between local and remote computers ), CentOS system installed on Jdk-6u24-linux-i586.bin, VirtualBox (installation process not mentioned here)
Installation steps:
Note: To avoid problems caused by file operation permissions during installation, the following actions are performed under root user.
1. Install SSH password-free login
The Hadoop runtime requires remote management of the Hadoop daemon, the Linux virtual machine does not have an SSH server, and the installation commands are as follows:
sudo apt-get install SSH
The communication between Hadoop processes takes the form of SSH (encrypted communication protocol), avoids the need to enter a password each time, in order to automate the operation, the following configuration:
First generate the key:
The command "ssh-keygen-t RSA" means using RSA encryption method to generate the key, enter, will prompt three times the input information, directly enter. Then go to the key folder and execute the command:
Using SSH to log on to this computer is because Hadoop requires SSH access when it is deployed natively.
Verify that password-free logon is possible:
2. Installing the JDK
Install the JDK into the/usr/local directory, "." Unzip the file
For later convenience, you can rename the Jdk-6u24-linux-i586.bin to JDK with the MV command
Next, configure the JDK commands to the environment variables
After the configuration is complete, for the file's configuration to take effect immediately, use this command:
SOURCE /etc/profile
Verify that the installation is successful:
3. Turn off the firewall
There is a Setup command to call off the firewall tool, the purpose of shutting down the firewall is to avoid unnecessary errors, the role of the firewall itself is to close the unused port, to avoid the server is attacked. Do not do this in actual work.
Select firewall configuration and press ENTER
Dot space, * disappear, * Presence indicates the firewall is enabled, then tab to OK, step-by-step exit
Verify that the firewall is off:
Configuration of the 4.virtualbox network
Here is the Host-only connection method, the characteristics of the Host-only connection:
Virtual machine Access host, with the host of the VirtualBox host-only Network network card ip:192.168.56.1, regardless of the host "local connection" there is no red fork, always pass.
host access to the virtual machine, with the virtual machine's network card 3 ip:192.168.56.101, regardless of the host "local connection" there is no red fork, always pass.
virtual machine access to the Internet, with its own network card 2, then the host to be able to through the "local connection" wired Internet, (wireless card not)
To start the configuration:
In Linux, do the following:
Attention:
Gateway, which is the IP address set manually by the VirtualBox host-only network, means that the virtual machine communicates with the host.
This allows the virtual machine and the host to communicate with each other even if the computer is not connected to the Internet.
5. Setting up DNS resolution
DNS resolution must be set up because the Hadoop cluster is accessed from each other by host names. To edit a DNS resolution file using Gedit
Append to File:
192.168.56.100 Hadoop (Note that there are spaces between them)
Save Close.
Hadoop pseudo-distribution pattern building (top)