Hadoop-1.x installation and configuration

Source: Internet
Author: User

Hadoop-1.x installation and configuration

1. Install JDK and SSH before installing Hadoop.

Hadoop is developed in Java. MapReduce and Hadoop compilation depend on JDK. Therefore, JDK1.6 or later must be installed first (JDK 1.6 is generally used in the actual production environment, because some Hadoop components do not support JDK 1.7 or later versions ). Hadoop uses SSH to start the daemon process on the Slave machine. Hadoop adopts the same processing method as the cluster for the pseudo-distributed running on a single machine. Therefore, SSH must also be installed.

To install and configure JDK, follow these steps:

(1) download the JDK1.6 installation package from the Internet

(2) install JDK1.6

Decompress the installation package to/Library/Java/JavaVirtualMachines /.

(3) Configure Environment Variables

Add the Java environment configuration in. bash_profile

 

export JAVA_6_HOME=/Library/Java/JavaVirtualMachines/jdk1.6.0.jdk/Contents/Homeexport JAVA_HOME=$JAVA_6_HOMEexport PATH=$PATH:$JAVA_HOME/bin
Enter source. bash_profile in the terminal to load the configuration.

 

(4) Verify that JDK is successfully installed

Enter the command on the terminal: java-version

The following information indicates that JDK is successfully installed:

 

bowen@bowen ~$ java -versionjava version "1.6.0_37"Java(TM) SE Runtime Environment (build 1.6.0_37-b06-434)Java HotSpot(TM) 64-Bit Server VM (build 20.12-b01-434, mixed mode)

Install and configure SSH

(1) run the following command to install ssh:

$ Sudo apt-get install openssh-server

(2) set the local machine to password-free login.

Create an SSH key ~ The id_rsa and id_rsa.pub files are generated under the/. ssh/directory, which is a pair of SSH public and private keys.

 

$ ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa $ cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys
(3) Verify that SSH is successfully installed

 

Enter the command: ssh-version

Display result:

 

bowen@bowen ~$ ssh -versionOpenSSH_6.2p2, OSSLShim 0.9.8r 8 Dec 2011Bad escape character 'rsion’.
Log on to the local machine and use ssh WuCloud or ssh localhost. Enter "yes" for the first time, and then exit. Then, log on again. If no password is needed, this step is complete.

2. install and configure Hadoop

(1) download the installation package from the Hadoop website and decompress it. The version I use here is the hadoop-1.2.1.

$ Sudo tar-zxvf hadoop-1.2.1.tar.gz

(2) Hadoop Configuration

You can start a Hadoop cluster in one of the following three modes:

Standalone mode; pseudo-distributed mode; fully distributed mode.

Compared with fully distributed deployment, pseudo-distributed deployment does not reflect the advantages of cloud computing, but facilitates program development and testing. Due to restrictions, the pseudo-distributed configuration of Hadoop is used here. You need to go to the conf directory of hadoop and modify the following files.

(3) specifying the JDK installation location in the hadoop-env.sh:

Export JAVA_HOME =/Library/Java/JavaVirtualMachines/jdk1.6.0.jdk/Contents/Home

(4) configure the address and port number of HDFS in the core-site.xml:

 

<configuration>      <property>          <name>fs.default.name</name>         <value>hdfs://127.0.0.1:9000</value>     </property> </configuration>
(5) configure the backup mode for HDFS in the hdfs-site.xml. The default value is 3, which must be 1 in the standalone version of hadoop.

 

 

  <configuration>      <property>          <name>dfs.replication</name>          <value>1</value>      </property>  </configuration>
(6) In the mapred-site.xml, configure the address and port of JobTracker.

 

 

<configuration>      <property>          <name>mapred.job.tracker</name>          <value>localhost:9001</value>      </property> </configuration>
(7) When hadoop is run for the first time, format the Hadoop file system.

 

In the hadoop directory, enter:

$ Bin/hadoop namenode-format

(8) Start the Hadoop service:

$ Bin/start-all.sh

If no error is reported, the startup is successful.

(9) enter the following URL in the browser:

Http: // localhost: 50030 (MapReduce Web page)

Http: // localhost: 50070 (HDFS Web page)

If you can view it properly, it indicates that the installation is successful.

(10) test. Run the wordcount example to check whether hadoop can run jobs.

In the hadoop directory, hadoop-examples-1.2.1.jar is a testing program, which contains a lot of testing code. Create a directory, such as/home/hadoop/input/, and copy some text files to this directory.

Run the following command:

 

$ bin/hadoop jar hadoop-examples-1.2.1.jar wordcout /home/hadoop/input/ /home/hadoop/output/
After the end of the run will generate an output directory in/home/hadoop/, there are two files part-r-00000 and _ SUCCESS, see _ SUCCESS will know has been successful, open the part-r-00000, we can see that the number of occurrences of each word is counted.

 

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.