Hadoop1.2.1 Installation--single-node mode and single-machine pseudo-distribution mode
First, Requirements Section
Before installing Hadoop on Linux, you need to install two programs first:
1) JDK 1.6 (or later). Hadoop is a program written in Java, and Hadoop's compilation and MapReduce operations require the use of JDK. Therefore, you must install JDK 1.6 or later before you install Hadoop.
2) SSH (Secure Shell Protocol), recommended installation OpenSSH. Hadoop requires SSH to start the daemon for each host in the slave list, so SSH must also be installed, even if the pseudo-distributed version is installed (because Hadoop does not have a separate cluster and pseudo-distributed). For pseudo-distributed, Hadoop takes the same approach as the cluster, which starts the process on the host that is documented in the file conf/slaves in order, but salve as localhost (i.e. itself) in pseudo-distributed, so for pseudo-distributed Hadoop, SSH is required.
Second, Environment
- Vmware®workstation 10.04
- Ubuntu14.04 32-bit
- Java JDK 1.6.0
- hadoop1.2.1
Third, Installation Steps
L JDK installation
(1) Download the JDK installation package
Make sure you can connect to the Internet and download the JDK 1.6 installation package from the Http://www.oracle.com/technetwork/java/javase/downloads page (filename similar to jdk-***- Linux-i586.bin, it is not recommended to install JDK version 1.7 because not all software supports version 1.7) to the JDK installation directory (This chapter assumes that the IDK installation directory is/USR/LIB/JVM/JDK).
(2) manual installation of JDK
Under Terminal, enter the JDK installation directory and enter the command:
sudo chmod u+x jdk-***-linux-i586.bin
After you have modified the permissions, you can install them and enter the command in the terminal:
Sudo-s./jdk-***-linux-i586.bin
After the installation is complete, you can start configuring environment variables.
(3) Configuring environment variables
Enter the command:
sudo gedit/etc/profile
Enter the password to open the profile file.
At the bottom of the file, enter the following:
#set Java Environment
Export JAVA_HOME=/USR/LIB/JVM/JDK
Export Classpath= ".: $JAVA _home/lib: $CLASSPATH"
Export path= "$JAVA _home/: $PATH"
The point of this step is to configure the environment variables so that the system can find the JDK.
(4) Verify that the JDK is installed successfully
Enter the command:
Java–version
The following JDK version information appears:
Java Version "1.6.0_22"
Java (TM) SE Runtime Environment (build 1.6.0_22-b04)
Java HotSpot (TM) Client VMs (build 17.1-b03, mixed mode, sharing)
If the JDK version information above indicates that the currently installed JDK is not set to the default JDK for Ubuntu system, then you will need to manually set the installed JDK to the system default JDK.
(5) Manually set the system default JDK
In the terminal, enter the command in turn:
sudo update-alternatives--install/usr/bin/java Java/usr/lib/jvm/jdk/bin/java 300
sudo update-alternatives--install/usr/bin/javac Javac/usr/lib/jvm/jdk/bin/javac 300
sudo update-alternatives--config java
You can then enter Java–version to see the version information for the JDK you have installed.
• SSH installation
Suppose the user name is U:
1) Confirm that you are connected to the Internet, and then enter the command:
sudo apt-get install SSH
2) configured to allow password-free login to this machine. First look at whether there is a. ssh folder under the U user (note that SSH is preceded by ".", which is a hidden folder), enter the command:
ls–a/home/u
In general, this hidden folder is created automatically under the current user when you install SSH, and if not, you can create one manually.
Next, enter the command (note that the following command is not a double quotation mark, which is two single quotes):
Ssh-keygen-t Dsa-p "-F ~/.SSH/ID_DSA
Explain that the Ssh-keygen represents the generated key;-t (note case sensitive) represents the specified generated key type; DSA is the meaning of DSA key authentication, that is, the key type;-P is used to provide a passphrase;-f Specifies the generated key file.
In Ubuntu, ~ represents the current user folder, which is/home/u.
This command creates ID_DSA and id_dsa.pub two files under the. SSH folder, which is a pair of SSH private keys and public keys, similar to keys and locks, and appends the id_dsa.pub (public key) to the authorized key.
Enter the command:
Cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys
The function of this command is to add the public key to the public key file for authentication, where the Authorized_keys is the public key file for authentication.
This password-free login has been configured.
3) Verify that SSH is installed successfully and that you can log on to the computer without password.
Enter the command:
Ssh–version
Show Results:
Openssh_5.8p1 debian-7ubuntu1, OpenSSL 1.0.0e 6 SEP 2011
Bad escape character ' rsion '.
Shows that SSH has been installed successfully.
Enter the command:
SSH localhost
Will be shown as follows:
The authenticity of host ' localhost ' (:: 1) ' can ' t be established.
RSA key fingerprint is 8b:c3:51:a5:2a:31:b7:74:06:9d:62:04:4f:84:f8:77.
Is you sure want to continue connecting (yes/no)? Yes
warning:permanently added ' localhost ' (RSA) to the list of known hosts.
Linux Master 2.6.31-14-generic #48-ubuntu SMP Fri Oct 14:04:26 UTC i686
To access official Ubuntu documentation, please visit:
http://help.ubuntu.com/
Last Login:sat Feb 17:12:40-from master
This means that the installation is successful and the first time you log in will ask if you want to continue the link and enter Yes.
In fact, during the installation of Hadoop, whether password-free login is irrelevant, but if you do not configure password-free login, each time you start Hadoop will need to enter a password to log on to each machine's datanode, considering that the average Hadoop cluster has hundreds of or thousands of machines, Therefore, SSH password-free login is generally configured.
• Hadoop installation
Hadoop has three modes of operation: Standalone, pseudo-distributed, and fully distributed. , the first two methods do not reflect the advantages of cloud computing, but they facilitate the testing and debugging of the program.
You can get the official release version of Hadoop at the following address: Http://www.apache.org/dyn/closer.cgi/Hadoop/core/.
Download hadoop-1.2.1.tar.gz and unzip it, the book will be the default to extract Hadoop into the/home/u/directory.
(1) Stand-alone mode configuration method
Hadoop is considered a separate Java process, which is often used for debugging, without the need for configuration for the installation of single-machine mode.
(2) pseudo-distributed Hadoop configuration
The pseudo-distributed Hadoop is a cluster of only one node, in this cluster, this node is both master and slave; both Namenode and Datanode; both Jobtracker and Tasktracker.
The pseudo-distributed configuration process is also simple and requires only a few files to be modified.
A) Enter the Conf directory under the Hadoop directory and add the Java installation directory to the hadoop-env.sh :
Export JAVA_HOME=/USR/LIB/JVM/JDK
b) Modify the Conf/core-site.xmlas follows:
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>
c) Modify the Conf/hdfs-site.xmlas follows:
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>
d) Modify the Conf/mapred-site.xmlto read as follows:
<configuration>
<property>
<name>mapred.job.tracker</name>
<value>localhost:9001</value>
</property>
</configuration>
After the above files are configured, the installation configuration for Hadoop is complete.
• Hadoop start-up
The first time you start Hadoop, you need to format Hadoop's file system HDFs. Go to the Hadoop folder and enter the command:
Bin/hadoop Namenode–format
Next, start Hadoop with the following command:
bin/start-all.sh
Finally, verify that Hadoop is installed successfully.
Open the browser and enter the URL separately:
http://localhost:50030 (Web page for MapReduce)
http://localhost:50070 (HDFS Web page)
If you can see it, Hadoop is already installed successfully.
It is necessary for Hadoop to start all processes, but you can still only start HDFs (start-dfs.sh) or MapReduce (start-mapred.sh) if necessary.
Reference: "Hadoop Combat (Second Edition)" Lu Jiaheng, mechanical industry Press
(i) Hadoop1.2.1 installation--single-node mode and single-machine pseudo-distribution mode