Tutorial on installing hadoop1.2.1 pseudo-Distribution Mode

Last Update:2014-11-04 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

1: software environment preparation 1.1 hadoop:

We use hadoop release 1.2.1 (stable). Download link:

Http://mirrors.ustc.edu.cn/apache/hadoop/common/hadoop-1.2.1/

Select hadoop-1.2.1-bin.tar.gz to download the file.

1.2 Java:

Java uses jdk1.7 and 1.6. Download link:

Http://www.oracle.com/technetwork/java/javase/downloads/jdk7-downloads-1880260.html

We select the release for Linux x86. This step is very important. Different machines must be configured with different JDK versions.

2: install it in Linux 2.1 to create a directory:

First, create a directory:

Mkdir/data/installation, which stores two downloaded installation packages.

Mkdir/data/software/hadoop, which stores hadoop program files.

Mkdir/data/software/Java, which stores JDK files.

Mkdir/data/software/eclipse, which stores eclipse files.

Note: during actual installation, it is best to create an account for running the hadoop program and grant related permissions. Here, I directly install it as root.

2.2 unzip the installation package:

Put all the downloaded files in the/data/installation/directory.

First, extract the Java installation package and run the following command:

Tar-xzvf/data/installation/jdk-7u40-linux-x64.tar.gz-C/data/software/Java/

Decompress the hadoop installation package and run the following command:

Tar-xzvf/data/installation/hadoop-1.2.1-bin.tar.gz-C/data/software/hadoop/

Decompress the eclipse installation package and run the following command:

Tar-xzvf/data/installation/eclipse-standard-kepler-SR1-linux-gtk.tar.gz-C/data/software/Eclipse/

3: Configure hadoop

Configuration of the hadoop environment is important. You must first configure the Java Runtime Environment.

3.1 configure the Java environment:

Add the java_home and classpath environment variables:

Run the VI/etc/profile command to edit the profile file. Add the following content to the end of the file:

Hadoop_install =/data/software/hadoop/hadoop-1.2.1/

Java_home =/data/software/Java/jdk1.7.0 _ 40

Path = $ java_home/bin: $ hadoop_install/bin: $ path

Classpath = $ java_home/lib

Export java_home path classpath hadoop_install

Save and exit. Use source/etc/profile to make the change take effect immediately.

Run the Java-version command to check whether the configuration is successful. If the configuration is successful, the following information appears:

Java version "1.7.0 _ 40"

Java (TM) se Runtime Environment (build 1.7.0 _ 40-b43)

Java hotspot (TM) Client VM (build 24.0-b56, mixed mode)

3.2 configure the SSH environment:

Run the following command to set up an SSH password-less connection:

Ssh-keygen-t dsa-P ""-f ~ /. Ssh/id_dsa

Cat ~ /. Ssh/id_dsa.pub> ~ /. Ssh/authorized_keys

Test whether the SSH configuration is successful:

SSH localhost

You can see that the configuration is successful, and the connection does not require a password;

There are many methods to configure SSH password-less access. The above is only one of them. The following is another method:

Cd ~

Ssh-keygen-T RSA

CD. SSH

CP id_rsa.pub authorized_keys

SSH hostname // test whether the connection to the hostname is successful

3.3 configure the hadoop environment:

We first go to the/data/software/hadoop/hadoop-1.2.1/conf directory, and then we can see the haddoop-env.sh, core-site.xml, mapred-site.xml, hdfs-site.xml these four files as well as the slaves and masters files that need to be configured in full distribution mode:

3.3.1 configuration hadoop-env.sh:

First we use VI hadoop-env.sh command edit open the hadoop-env.sh file, find the java_home keyword line, remove the previous #, and then fill in the actual java_home address:

Export java_home =/data/software/Java/jdk1.7.0 _ 40

3.3.2 configure core-site.xml:

VI core-site.xml open the core-site.xml file and add the following to the configuration tab:

<Name> fs. Default. Name </Name>

<Value> HDFS :/// localhost: 9000 </value>

</Propety>

<! -Fs. default. name: used to configure namenode and specify the URL of the HDFS file system. Through this URL, we can access the content of the file system, or change localhost to the local IP address. If it is in full distribution mode, you must change localhost to the IP address of the actual namenode machine. If no port is written, the default port 8020 is used. -->

<Name> hadoop. tmp. dir </Name>

<Value>/data/tmp/hadoop_tmp </value>

</Property>

<! -- Hadoop. tmp. dir: the default temporary path of hadoop. It is recommended that you delete the tmp directory in this file if the specified datanode cannot be started when a new node or other node is added. However, if the directory of the namenode machine is deleted, you need to re-execute the namenode formatting command. This directory must be created manually in advance. -->

3.3.3 configuration hdfs-site.xml:

Add the following content to the configuration tab. All directories that do not exist must be created in advance:

<Value>/data/appdata/hadoopdata </value>

</Property>

<! -- Configure the HDFS storage directory and data storage directory for datanode to store data -->

<Value>/data/appdata/hadoopname </value>

</Property>

<! -Used to store the file system metadata of namenode, including editing logs and file system images. If you change the address, you need to use the hadoop namenode-format command again to format namenode -->

<Name> DFS. Replication </Name>

</Proerty>

<! -This parameter is used to set the number of redundant backups of the file system. Because there is only one node, all backups are set to 1, and the default number of backups is 3 -->

3.3.4 configure mapred-site.xml:

Add the following content to the configuration tab:

<Name> mapred. Job. Tracker </Name>

<Value> localhost: 9001 </value>

</Property>

<! -This configuration item is used to configure the jobtracker node. localhost can also be changed to the IP address of the local machine. In real distribution mode, change it to the IP address of the actual jobtracker machine. -->

4: Start hadoop4.1: to test whether hadoop configuration is successful:

4.2: Format namenode:

CD/data/software/hadoop/hadoop-1.2.1/bin

./Hadoop namenode-format

4.3: Start the hadoop process, run the start-all.sh:

CD/data/software/hadoop/hadoop-1.2.1/bin

/Start-all.sh

We can use the JPS command of Java to check whether the process has been successfully started. From this we can see that the five processes secondarynamenode, jobtracker, namenode, datanode, and trasktracker have been successfully started, these five processes are exactly what hadoop needs. If a process is not successfully started, it means the entire cluster is not working, we can enter/data/software/hadoop/hadoop-1.2.1/libexec /.. view the failure log in the/logs/directory.

4.4: View hadoop information from the browser:

We can access hadoop from a browser on the local machine or another machine.

View jobtracker information:

Http: // 192.168.0.107: 50030/jobtracker. jsp

Only part of the page is displayed.

View namenode information:

Http: // 192.168.0.107: 50070/dfshealth. jsp

Only part of the page is displayed.

View trasktracker information:

Http:/// 192.168.0.107: 50060/tasktracker. jsp

5: hadoop instance

Here we will test the example of examples that comes with hadoop. In this example, there is a wordcount class, which is used to calculate the number of times each word appears in the file. The examples jar package is located under the hadoop installation directory named hadoop-examples-1.2.1.jar:

5.1: Go to the bin directory.

First, go to the bin directory:

CD/data/software/hadoop/hadoop-1.2.1/bin

5.2: Create a folder

Then we create an Input Folder and create three files to write some content to each file:

Mkdir Input

Echo "Hello hadoop"> input/f1.txt

Echo "Hello word"> input/f2.txt

Echo "Hello Java"> input/f3.txt

5.3: Create a folder in hadoop

Use the following command to create a folder in hadoop:

Hadoop DFS-mkdir Input

Then, check whether the folder has been created in hadoop:

Hadoop DFS-ls/user/root

We can see that the Input Folder has been successfully created in hadoop.

5.4: copy the file to hadoop

Run the following command to copy the file from Linux to hadoop:

Hadoop DFS-put input/* Input

Check whether the file is in hadoop:

Hadoop DFS-ls Input

Check whether the file content is consistent:

Hadoop DFS-cat input/f1.txt

We can see that the file has been successfully put into the hadoop file system.

5.5: view the file content from the browser

You can also browse the directory of the entire HDFS file system from the browser and open the namenode link:

Http: // 192.168.0.107: 50070/dfshealth. jsp

Then there will be a browse the filesystem hyperlink. Click it to see the corresponding directory structure.

5.6: Example of running examples

Run the wordcount program using the following command:

Hadoop jar ../hadoop-examples-1.2.1.jar wordcount Input Output

Note that the current directory is the bin directory, while the jar package is in the upper-level directory, you need to use it to locate the jar package .. indicates the parent directory, wordcount indicates the class name in the jar package, indicating to execute this class, input is the Input Folder, output is the Output Folder, must not exist, it is automatically created by the program, if the output folder exists in advance, an error is returned.

We can see that the program has run successfully, and the next step is to view the running result.

5.7: view running results

We can check the content of the Output Folder to check whether the program has successfully created a folder, and check the program execution result by viewing the content of the part-r-00000 file in the output file:

We can see that hadoop appears once, hello appears three times, Java appears once, and world appears once. This is the same as expected, indicating that the execution is successful.

6. Disable the hadoop process.

If we want to shut down the hadoop cluster, just run the stop-all.sh:

CD/data/software/hadoop/hadoop-1.2.1/bin

/Stop-all.sh

We can see that only one JPs process is running, and other hadoop processes are closed.

Tutorial on installing hadoop1.2.1 pseudo-Distribution Mode

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Tutorial on installing hadoop1.2.1 pseudo-Distribution Mode

Contact Us

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support