1. Hadoop software delivered to virtual machines
Or use WINSCP to put the Hadoop software installation package in the Linux downloads folder.
2. Select the installation directory
Copy the Hadoop installation package into this installation directory, where we select the/usr/local directory in CentOS.
3. Unzip the installation package
See the suffix for the Hadoop installation package is. tar.gz. So unzip it directly with the tar command.
#tar-zxvf xxx // unzip. tar.gz file
After the decompression will produce a folder, named hadoop-1.1.2, the name is too long, not good, rename
#mv hadoop-1.1. 2 Hadoop
4. Hadoop Setting Environment variables
After the decompression, the Hadoop directory must have a directory structure, regardless of what, there must be a bin directory , which contains a variety of commands can be executed. So add the Bin directory to the environment variable.
#vi/etc/profile
In the configuration file, add:
Export hadoop_home=/usr/local/hadoopexport PATH=.: $JAVA _home/bin: $HADOOP _home/bin: $PATH
To make the configuration file effective:
#source/etc/profile
5. Modify the configuration file
In order to fit the pseudo-distributed installation of Hadoop, here are some configuration files to modify. The directory for the HADOOP configuration file is under the $hadoop_home/conf directory. The files to be modified are:
hadoop-enc.sh, Core-site.xml, Hdfs-site.xml, Mapred-site.xml
To facilitate the modification, we do not need VI to modify the file, but use WINSCP directly under Windows to modify.
Find this file in WinSCP and edit the file.
(1) hadoop-env.sh File modification content:
Export JAVA_HOME=/USR/LOCAL/JDK
(2) Core-site.xml File modification content:
<configuration> <property> <name>fs. Default.name</name> <value>hdfs://hadoop:9000</value> < Description>change your own hostname</description> </property> <property> < name>hadoop.tmp.dir</name> <value>/usr/local/hadoop/tmp</value> </property ></configuration>
Note: The value in the above <value> Hadoop is the host name of the machine, to make the corresponding changes according to its own settings.
(3) Hdfs-site.xml File modification content:
<configuration> <property> <name>dfs.replication</name> <value>1 </value> </property> <property> <name>dfs.permissions</name> <value>false</value> </property></configuration>
(4) Mapred-site.xml File modification content:
<configuration> <property> <name>mapred.job.tracker</name> <value >hadoop:9001</value> <description>change your own hostname</description> </property></configuration>
Note: The value in the above <value> Hadoop is the host name of the machine, to make the corresponding changes according to its own settings.
6. Format Hadoop
Re-organizing the HDFs file system
#hadoop Namenode-format // Format the HDFs file system for Hadoop
7. Start Hadoop
The start command script is in the $hadoop_home/bin directory, so it can be run directly
#start-all.sh
As mentioned earlier, Hadoop is running some Java processes, so starting Hadoop can see the corresponding Java process, view the way:
#jps // View the currently running Java process
This command is not an operating system, it is located in the JDK and is designed to view the Java process
8. View Hadoop through a browser
Enter hadoop:50070 in the Linux browser to see Namenode, stating that the Namenode process is alive and that Namenode itself is a Web server.
Enter hadoop:50030 in the Linux browser to see Jobtracker, stating that the jobtracker process is alive and that Jobtracker itself is a Web server.
Also in the Windows host machine through the IP address: 50070, IP address: 50030 can see the same content. If you want to access Hadoop by hostname, bind the IP address to the hostname of Hadoop:
In the C:\Windows\System32\drivers\etc\hosts of Windows, add this file:
192.168. 80.100 Hadoop
You can then access Hadoop through the hostname: port number under Windows.
Note: This can be accessed, the first to be able to ping each other.
9. Problem correction
<1>namenode process did not start successfully?
(1) No formatting
(2) configuration file only copy, no host name modified
(3) hostname is not bound to IP address
(4) SSH password-free login not configured successfully
<2> is it wrong to format Hadoop multiple times?
Workaround: Delete the/usr/local/hadoop/tmp folder and reformat it to resolve.
The Linux command I used--install Hadoop