1: software environment preparation 1.1 hadoop:
We use hadoop release 1.2.1 (stable). Download link:
Http://mirrors.ustc.edu.cn/apache/hadoop/common/hadoop-1.2.1/
Select hadoop-1.2.1-bin.tar.gz to download the file.
1.2 Java:
Java uses jdk1.7 and 1.6. Download link:
Http://www.oracle.com/technetwork/java/javase/downloads/jdk7-downloads-1880260.html
We select the release for Linux x86. This step is very important. Different machines must be configured with different JDK versions.
2: install it in Linux 2.1 to create a directory:
First, create a directory:
Mkdir/data/installation, which stores two downloaded installation packages.
Mkdir/data/software/hadoop, which stores hadoop program files.
Mkdir/data/software/Java, which stores JDK files.
Mkdir/data/software/eclipse, which stores eclipse files.
Note: during actual installation, it is best to create an account for running the hadoop program and grant related permissions. Here, I directly install it as root.
2.2 unzip the installation package:
Put all the downloaded files in the/data/installation/directory.
First, extract the Java installation package and run the following command:
Tar-xzvf/data/installation/jdk-7u40-linux-x64.tar.gz-C/data/software/Java/
Decompress the hadoop installation package and run the following command:
Tar-xzvf/data/installation/hadoop-1.2.1-bin.tar.gz-C/data/software/hadoop/
Decompress the eclipse installation package and run the following command:
Tar-xzvf/data/installation/eclipse-standard-kepler-SR1-linux-gtk.tar.gz-C/data/software/Eclipse/
3: Configure hadoop
Configuration of the hadoop environment is important. You must first configure the Java Runtime Environment.
3.1 configure the Java environment:
Add the java_home and classpath environment variables:
Run the VI/etc/profile command to edit the profile file. Add the following content to the end of the file:
Hadoop_install =/data/software/hadoop/hadoop-1.2.1/
Java_home =/data/software/Java/jdk1.7.0 _ 40
Path = $ java_home/bin: $ hadoop_install/bin: $ path
Classpath = $ java_home/lib
Export java_home path classpath hadoop_install
Save and exit. Use source/etc/profile to make the change take effect immediately.
Run the Java-version command to check whether the configuration is successful. If the configuration is successful, the following information appears:
Java version "1.7.0 _ 40"
Java (TM) se Runtime Environment (build 1.7.0 _ 40-b43)
Java hotspot (TM) Client VM (build 24.0-b56, mixed mode)
3.2 configure the SSH environment:
Run the following command to set up an SSH password-less connection:
Ssh-keygen-t dsa-P ""-f ~ /. Ssh/id_dsa
Cat ~ /. Ssh/id_dsa.pub> ~ /. Ssh/authorized_keys
Test whether the SSH configuration is successful:
SSH localhost
You can see that the configuration is successful, and the connection does not require a password;
There are many methods to configure SSH password-less access. The above is only one of them. The following is another method:
Cd ~
Ssh-keygen-T RSA
CD. SSH
CP id_rsa.pub authorized_keys
SSH hostname // test whether the connection to the hostname is successful
3.3 configure the hadoop environment:
We first go to the/data/software/hadoop/hadoop-1.2.1/conf directory, and then we can see the haddoop-env.sh, core-site.xml, mapred-site.xml, hdfs-site.xml these four files as well as the slaves and masters files that need to be configured in full distribution mode:
3.3.1 configuration hadoop-env.sh:
First we use VI hadoop-env.sh command edit open the hadoop-env.sh file, find the java_home keyword line, remove the previous #, and then fill in the actual java_home address:
Export java_home =/data/software/Java/jdk1.7.0 _ 40
3.3.2 configure core-site.xml:
VI core-site.xml open the core-site.xml file and add the following to the configuration tab:
<Property>
<Name> fs. Default. Name </Name>
<Value> HDFS :/// localhost: 9000 </value>
</Propety>
<! -Fs. default. name: used to configure namenode and specify the URL of the HDFS file system. Through this URL, we can access the content of the file system, or change localhost to the local IP address. If it is in full distribution mode, you must change localhost to the IP address of the actual namenode machine. If no port is written, the default port 8020 is used. -->
<Property>
<Name> hadoop. tmp. dir </Name>
<Value>/data/tmp/hadoop_tmp </value>
</Property>
<! -- Hadoop. tmp. dir: the default temporary path of hadoop. It is recommended that you delete the tmp directory in this file if the specified datanode cannot be started when a new node or other node is added. However, if the directory of the namenode machine is deleted, you need to re-execute the namenode formatting command. This directory must be created manually in advance. -->
3.3.3 configuration hdfs-site.xml:
Add the following content to the configuration tab. All directories that do not exist must be created in advance:
<Property>
<Name> DFS. Data. dir </Name>
<Value>/data/appdata/hadoopdata </value>
</Property>
<! -- Configure the HDFS storage directory and data storage directory for datanode to store data -->
<Property>
<Name> DFS. Name. dir </Name>
<Value>/data/appdata/hadoopname </value>
</Property>
<! -Used to store the file system metadata of namenode, including editing logs and file system images. If you change the address, you need to use the hadoop namenode-format command again to format namenode -->
<Property>
<Name> DFS. Replication </Name>
<Value> 1 </value>
</Proerty>
<! -This parameter is used to set the number of redundant backups of the file system. Because there is only one node, all backups are set to 1, and the default number of backups is 3 -->
3.3.4 configure mapred-site.xml:
Add the following content to the configuration tab:
<Property>
<Name> mapred. Job. Tracker </Name>
<Value> localhost: 9001 </value>
</Property>
<! -This configuration item is used to configure the jobtracker node. localhost can also be changed to the IP address of the local machine. In real distribution mode, change it to the IP address of the actual jobtracker machine. -->
4: Start hadoop4.1: to test whether hadoop configuration is successful:
4.2: Format namenode:
CD/data/software/hadoop/hadoop-1.2.1/bin
./Hadoop namenode-format
4.3: Start the hadoop process, run the start-all.sh:
CD/data/software/hadoop/hadoop-1.2.1/bin
/Start-all.sh
We can use the JPS command of Java to check whether the process has been successfully started. From this we can see that the five processes secondarynamenode, jobtracker, namenode, datanode, and trasktracker have been successfully started, these five processes are exactly what hadoop needs. If a process is not successfully started, it means the entire cluster is not working, we can enter/data/software/hadoop/hadoop-1.2.1/libexec /.. view the failure log in the/logs/directory.
4.4: View hadoop information from the browser:
We can access hadoop from a browser on the local machine or another machine.
View jobtracker information:
Http: // 192.168.0.107: 50030/jobtracker. jsp
Only part of the page is displayed.
View namenode information:
Http: // 192.168.0.107: 50070/dfshealth. jsp
Only part of the page is displayed.
View trasktracker information:
Http:/// 192.168.0.107: 50060/tasktracker. jsp
5: hadoop instance
Here we will test the example of examples that comes with hadoop. In this example, there is a wordcount class, which is used to calculate the number of times each word appears in the file. The examples jar package is located under the hadoop installation directory named hadoop-examples-1.2.1.jar:
5.1: Go to the bin directory.
First, go to the bin directory:
CD/data/software/hadoop/hadoop-1.2.1/bin
5.2: Create a folder
Then we create an Input Folder and create three files to write some content to each file:
Mkdir Input
Echo "Hello hadoop"> input/f1.txt
Echo "Hello word"> input/f2.txt
Echo "Hello Java"> input/f3.txt
5.3: Create a folder in hadoop
Use the following command to create a folder in hadoop:
Hadoop DFS-mkdir Input
Then, check whether the folder has been created in hadoop:
Hadoop DFS-ls/user/root
We can see that the Input Folder has been successfully created in hadoop.
5.4: copy the file to hadoop
Run the following command to copy the file from Linux to hadoop:
Hadoop DFS-put input/* Input
Check whether the file is in hadoop:
Hadoop DFS-ls Input
Check whether the file content is consistent:
Hadoop DFS-cat input/f1.txt
We can see that the file has been successfully put into the hadoop file system.
5.5: view the file content from the browser
You can also browse the directory of the entire HDFS file system from the browser and open the namenode link:
Http: // 192.168.0.107: 50070/dfshealth. jsp
Then there will be a browse the filesystem hyperlink. Click it to see the corresponding directory structure.
5.6: Example of running examples
Run the wordcount program using the following command:
Hadoop jar ../hadoop-examples-1.2.1.jar wordcount Input Output
Note that the current directory is the bin directory, while the jar package is in the upper-level directory, you need to use it to locate the jar package .. indicates the parent directory, wordcount indicates the class name in the jar package, indicating to execute this class, input is the Input Folder, output is the Output Folder, must not exist, it is automatically created by the program, if the output folder exists in advance, an error is returned.
We can see that the program has run successfully, and the next step is to view the running result.
5.7: view running results
We can check the content of the Output Folder to check whether the program has successfully created a folder, and check the program execution result by viewing the content of the part-r-00000 file in the output file:
We can see that hadoop appears once, hello appears three times, Java appears once, and world appears once. This is the same as expected, indicating that the execution is successful.
6. Disable the hadoop process.
If we want to shut down the hadoop cluster, just run the stop-all.sh:
CD/data/software/hadoop/hadoop-1.2.1/bin
/Stop-all.sh
We can see that only one JPs process is running, and other hadoop processes are closed.
Tutorial on installing hadoop1.2.1 pseudo-Distribution Mode