Hadoop pseudo-distributed configuration and Problems

Source: Internet
Author: User
Tags hadoop fs

1. Example of running wordcount

After creating a new directory on hadoop, use putprogram to input input1.txtand input2.txt files in linuxto/tmp/input/In the hadoop file system.

 

Hadoopfs-mkdir/tmp/Input

Hadoopfs-mkdir/tmp/Output

Hadoopfs-put input1.txt/tmp/input/

Hadoop FS-put input2.txt/tmp/input/

Execute the wordcount example. Note that the/tmp/output1 '/' must be added. The output directory/tmp/output1 cannot be created in advance.

Execution program:

Bin/hadoop jar ~ /Software/hadoop-0.20.2/hadoop-0.20.2-examples.jar wordcount/tmp/input/tmp/output1

View results:

(1) view HDFS in a browser

(2) command line: Bin/hadoop FS-CAT/tmp/output1/part-r-00000

 

2. HDFS cannot be accessed

Hadoop running status shut down directly, causing the hadoop data directory (/tmp/hadoop-username) to be deleted, re-start the computer, use the start-all.sh to start hadoop, run the JPS command (enter JPs directly in the terminal after the hadoop service is started) and the namenode is unavailable.

Solution: Delete the directory/tmp/hadoop-root/dfs/Name and format it again. Then run bin/stop-all.sh, bin/hadoop namenode
-Format Command Re-format.

Note: Use the command stop-all.sh to close hadoop before shutting down.

3. Click pseudo-distributed configuration

1. Install JDK and configure Environment Variables

Chmod + x jdk-6u24-linux-i586.bin

 

/Jdk-6u24-linux-i586.bin

 

Modify the file: sudo gedit/etc/profile

# Set Java environment

Export java_home = "/home/user/software/jdk1.6.0 _ 24"

Export classpath = "$ classpath: $ java_home/lib: $ java_home/JRE/lib"

Export Path = "$ java_home/bin: $ java_home/JRE/bin: $ path: $ homr/bin"

Umask 022

2. Install SSH

1) confirm the connection to the Internet and enter the command

Sudo apt-Get Install SSH

2) you can log on to the local machine without a password.

First, check whether the. Ssh folder exists under the U user (Note that there is "." In front of SSH, which is a hidden folder). Enter the following command:

1) ls-A/home/u

Generally, this hidden folder is automatically created under the current user during SSH installation. If there is no hidden folder, you can manually create one. U is the username of the current logon system.

Next, enter the command:

2) ssh-keygen-t dsa-p'-f ~ /. Ssh/id_dsa

Ssh-keygen indicates the generated key.-T (case sensitive) indicates the type of the generated key. DSA indicates the DSA key authentication, that is, the key type; -P is used to provide the secret language;-F specifies the generated key file. (I will not describe the knowledge about key cryptography in detail here. It will involve some knowledge about ssh. If you are interested, you can check the information on your own .)

In ubuntu ,~ Represents the current user folder, Which is/home/U.

This command will. create two files id_dsa and id_dsa.pub In the SSH folder. This is a pair of private keys and public keys for SSH. Similar to keys and locks, append id_dsa.pub (Public Key) to the authorized key.

Enter the following command:

3) cat ~ /. Ssh/id_dsa.pub> ~ /. Ssh/authorized_keys

This section adds the public key to the public key file used for authentication. The authorized_keys here is the public key file used for authentication.

At this point, no password is set to log on to the local machine.

4) Verify that SSH is successfully installed and that you can log on to the local machine without a password. (If you need to enter a password, you can modify passwordauthentication no in the/etc/ssh/sshd_config file)

 

Enter the following command:

1. Ssh-version

Display result:

Openssh_5.1p1 Debian-6ubuntu2, OpenSSL 0.9.8g 19oct 2007

Bad escapecharacter 'rsion '.

It indicates that SSH has been installed successfully.

Enter the following command:

2. Ssh localhost

The following information is displayed:

Theauthenticity of host 'localhost (: 1) 'can't be established.

RSA keyfingerprint is 8B: C3: 51: A5: 2a: 31: B7: 74: 06: 9d: 62: 04: 4f: 84: F8: 77.

Are you sureyou want to continue connecting (Yes/No )? Yes

Warning: permanentlyadded 'localhost' (RSA) to the list of known hosts.

Linux master2.6.31-14-generic # 48-ubuntu SMP Fri Oct 16 14:04:26 UTC 2009 i686

 

To accessofficial Ubuntu documentation, please visit:

Http://help.ubuntu.com/

Last login: Mon Oct 18 17:12:40 2010 from Master

Admin @ hadoop :~ $

This indicates that the installation is successful. When you log on for the first time, you will be asked if you want to continue the link. Enter yes to enter.

In fact, it does not matter whether to log on without a password during hadoop installation.

Configure login without a password. Every time hadoop is started, you need to enter a password to log on to the datanode of each machine. Considering that hadoop clusters usually have hundreds or thousands of machines, therefore, ssh password-less logon is usually configured.

PS-E | grep SSH

If you see sshd, it indicates that the ssh-server has been started.

If not, start sudo/etc/init. d/sshstart as follows:

The SSH-server configuration file is located in/etc/ssh/sshd_config. Here, you can define the SSH service port. The default port is 22. You can define other port numbers, such as 222.

Then restart the SSH service:

Sudo/etc/init. d/ssh stop

Sudo/etc/init. d/ssh start

 

3. Hadoop-env.sh:

Export java_home = Your JDK installation address // do not add double quotation marks

Specify the JDK installation location:

4. CONF/core-site.xml:

<Configuration>

<Property>

<Name> fs. Default. Name </Name>

<Value> HDFS :/// localhost: 9000 </value>

</Property>

</Configuration>

This is the core configuration file of hadoop. The address and port number of HDFS are configured here.

5. CONF/hdfs-site.xml:

<Configuration>

<Property>

<Name> DFS. Replication </Name>

<Value> 1 </value>

</Property>

</Configuration>

This is the HDFS configuration in hadoop. The default backup mode is 3. In the standalone version of hadoop, you need to change it to 1.

6. CONF/mapred-site.xml:

<Configuration>

<Property>

<Name> mapred. Job. Tracker </Name>

<Value> localhost: 9001 </value>

</Property>

</Configuration>

This is the configuration file of mapreduce in hadoop, Which is configured with the address and port of jobtracker.

Note that if the version is installed earlier than version 0.20, there is only one configuration file, that is, the Hadoop-site.xml.

 

Next, you need to format the HDFS File System of hadoop before starting hadoop (this is the same as that of windows, and the volume after partitioning always needs to be formatted ). Enter the hadoop folder and enter the following command:

1. bin/hadoop namenode-format

Format the file system and start hadoop.

Enter the following command:

1. bin/start-all.sh (all started)

Finally, verify that hadoop is successfully installed.

Open your browser and enter the URL:

 

1. http: // localhost: 50030 (mapreduce web page)

2. http: // localhost: 50070 (HDFS web page)

If it can be viewed, it indicates that hadoop has been installed successfully.

For hadoop, installing mapreduce and HDFS is required, but if necessary, you can still just start HDFS (start-dfs.sh) or mapreduce (start-mapred.sh ).

 

Iv. Problems Encountered

(1) In the hadoop/bin directory, direct execution of hadoop, start-all.sh and other command failure. But in the hadoop directory through bin/hadoop, bin/start-all.sh mode is not invalid.

Solution:

Method 1: use commands to add Environment Variables

Export Path = "$ path:/home/user/software/hadoop-0.20.2/bin :";

Do not execute exportpath = "/home/user/software/hadoop-0.20.2/bin:"; this will overwrite the environment variables in the path.

Note: The environment variables in the/etc/profile file will be automatically added, so it is best to write the above path in this file,

As follows:

# Set Java environment

Export java_home = "/home/user/software/jdk1.6.0 _ 24"

Export classpath = "$ classpath: $ java_home/lib: $ java_home/JRE/lib"

Export Path = "$ java_home/bin: $ java_home/JRE/bin: $ path: $ homr/bin:/home/user/software/hadoop-0.20.2/bin :"

Umask 022

 

After modification, you can directly execute through hadoop, start-all.sh, stop-all.sh, hadoop-daemonstart-namenode and other commands.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.