(i) Hadoop1.2.1 installation--single-node mode and single-machine pseudo-distribution mode

Source: Internet
Author: User

Hadoop1.2.1 Installation--single-node mode and single-machine pseudo-distribution mode

First, Requirements Section

Before installing Hadoop on Linux, you need to install two programs first:

1) JDK 1.6 (or later). Hadoop is a program written in Java, and Hadoop's compilation and MapReduce operations require the use of JDK. Therefore, you must install JDK 1.6 or later before you install Hadoop.

2) SSH (Secure Shell Protocol), recommended installation OpenSSH. Hadoop requires SSH to start the daemon for each host in the slave list, so SSH must also be installed, even if the pseudo-distributed version is installed (because Hadoop does not have a separate cluster and pseudo-distributed). For pseudo-distributed, Hadoop takes the same approach as the cluster, which starts the process on the host that is documented in the file conf/slaves in order, but salve as localhost (i.e. itself) in pseudo-distributed, so for pseudo-distributed Hadoop, SSH is required.

Second, Environment

    1. Vmware®workstation 10.04
    2. Ubuntu14.04 32-bit
    3. Java JDK 1.6.0
    4. hadoop1.2.1

Third, Installation Steps

L JDK installation

(1) Download the JDK installation package

Make sure you can connect to the Internet and download the JDK 1.6 installation package from the Http://www.oracle.com/technetwork/java/javase/downloads page (filename similar to jdk-***- Linux-i586.bin, it is not recommended to install JDK version 1.7 because not all software supports version 1.7) to the JDK installation directory (This chapter assumes that the IDK installation directory is/USR/LIB/JVM/JDK).

(2) manual installation of JDK

Under Terminal, enter the JDK installation directory and enter the command:

sudo chmod u+x jdk-***-linux-i586.bin

After you have modified the permissions, you can install them and enter the command in the terminal:

Sudo-s./jdk-***-linux-i586.bin

After the installation is complete, you can start configuring environment variables.

(3) Configuring environment variables

Enter the command:

sudo gedit/etc/profile

Enter the password to open the profile file.

At the bottom of the file, enter the following:

#set Java Environment

Export JAVA_HOME=/USR/LIB/JVM/JDK

Export Classpath= ".: $JAVA _home/lib: $CLASSPATH"

Export path= "$JAVA _home/: $PATH"

The point of this step is to configure the environment variables so that the system can find the JDK.

(4) Verify that the JDK is installed successfully

Enter the command:

Java–version

The following JDK version information appears:

Java Version "1.6.0_22"

Java (TM) SE Runtime Environment (build 1.6.0_22-b04)

Java HotSpot (TM) Client VMs (build 17.1-b03, mixed mode, sharing)

If the JDK version information above indicates that the currently installed JDK is not set to the default JDK for Ubuntu system, then you will need to manually set the installed JDK to the system default JDK.

(5) Manually set the system default JDK

In the terminal, enter the command in turn:

sudo update-alternatives--install/usr/bin/java Java/usr/lib/jvm/jdk/bin/java 300

sudo update-alternatives--install/usr/bin/javac Javac/usr/lib/jvm/jdk/bin/javac 300

sudo update-alternatives--config java

You can then enter Java–version to see the version information for the JDK you have installed.

SSH installation

Suppose the user name is U:

1) Confirm that you are connected to the Internet, and then enter the command:

sudo apt-get install SSH

2) configured to allow password-free login to this machine. First look at whether there is a. ssh folder under the U user (note that SSH is preceded by ".", which is a hidden folder), enter the command:

ls–a/home/u

In general, this hidden folder is created automatically under the current user when you install SSH, and if not, you can create one manually.

Next, enter the command (note that the following command is not a double quotation mark, which is two single quotes):

Ssh-keygen-t Dsa-p "-F ~/.SSH/ID_DSA

Explain that the Ssh-keygen represents the generated key;-t (note case sensitive) represents the specified generated key type; DSA is the meaning of DSA key authentication, that is, the key type;-P is used to provide a passphrase;-f Specifies the generated key file.

In Ubuntu, ~ represents the current user folder, which is/home/u.

This command creates ID_DSA and id_dsa.pub two files under the. SSH folder, which is a pair of SSH private keys and public keys, similar to keys and locks, and appends the id_dsa.pub (public key) to the authorized key.

Enter the command:

Cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys

The function of this command is to add the public key to the public key file for authentication, where the Authorized_keys is the public key file for authentication.

This password-free login has been configured.

3) Verify that SSH is installed successfully and that you can log on to the computer without password.

Enter the command:

Ssh–version

Show Results:

Openssh_5.8p1 debian-7ubuntu1, OpenSSL 1.0.0e 6 SEP 2011

Bad escape character ' rsion '.

Shows that SSH has been installed successfully.

Enter the command:

SSH localhost

Will be shown as follows:

The authenticity of host ' localhost ' (:: 1) ' can ' t be established.

RSA key fingerprint is 8b:c3:51:a5:2a:31:b7:74:06:9d:62:04:4f:84:f8:77.

Is you sure want to continue connecting (yes/no)? Yes

warning:permanently added ' localhost ' (RSA) to the list of known hosts.

Linux Master 2.6.31-14-generic #48-ubuntu SMP Fri Oct 14:04:26 UTC i686

To access official Ubuntu documentation, please visit:

http://help.ubuntu.com/

Last Login:sat Feb 17:12:40-from master

This means that the installation is successful and the first time you log in will ask if you want to continue the link and enter Yes.

In fact, during the installation of Hadoop, whether password-free login is irrelevant, but if you do not configure password-free login, each time you start Hadoop will need to enter a password to log on to each machine's datanode, considering that the average Hadoop cluster has hundreds of or thousands of machines, Therefore, SSH password-free login is generally configured.

Hadoop installation

Hadoop has three modes of operation: Standalone, pseudo-distributed, and fully distributed. , the first two methods do not reflect the advantages of cloud computing, but they facilitate the testing and debugging of the program.

You can get the official release version of Hadoop at the following address: Http://www.apache.org/dyn/closer.cgi/Hadoop/core/.

Download hadoop-1.2.1.tar.gz and unzip it, the book will be the default to extract Hadoop into the/home/u/directory.

(1) Stand-alone mode configuration method

Hadoop is considered a separate Java process, which is often used for debugging, without the need for configuration for the installation of single-machine mode.

(2) pseudo-distributed Hadoop configuration

The pseudo-distributed Hadoop is a cluster of only one node, in this cluster, this node is both master and slave; both Namenode and Datanode; both Jobtracker and Tasktracker.

The pseudo-distributed configuration process is also simple and requires only a few files to be modified.

A) Enter the Conf directory under the Hadoop directory and add the Java installation directory to the hadoop-env.sh :

Export JAVA_HOME=/USR/LIB/JVM/JDK

b) Modify the Conf/core-site.xmlas follows:
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>

c) Modify the Conf/hdfs-site.xmlas follows:
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>

d) Modify the Conf/mapred-site.xmlto read as follows:
<configuration>
<property>
<name>mapred.job.tracker</name>
<value>localhost:9001</value>
</property>
</configuration>

After the above files are configured, the installation configuration for Hadoop is complete.

Hadoop start-up

The first time you start Hadoop, you need to format Hadoop's file system HDFs. Go to the Hadoop folder and enter the command:

Bin/hadoop Namenode–format

Next, start Hadoop with the following command:

bin/start-all.sh

Finally, verify that Hadoop is installed successfully.

Open the browser and enter the URL separately:

http://localhost:50030 (Web page for MapReduce)

http://localhost:50070 (HDFS Web page)

If you can see it, Hadoop is already installed successfully.

It is necessary for Hadoop to start all processes, but you can still only start HDFs (start-dfs.sh) or MapReduce (start-mapred.sh) if necessary.

Reference: "Hadoop Combat (Second Edition)" Lu Jiaheng, mechanical industry Press

(i) Hadoop1.2.1 installation--single-node mode and single-machine pseudo-distribution mode

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.