Hadoop pseudo-distributed configuration and Problems

Last Update:2018-12-03 Source: Internet

Author: User

Tags hadoop fs

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

1. Example of running wordcount

After creating a new directory on hadoop, use putprogram to input input1.txtand input2.txt files in linuxto/tmp/input/In the hadoop file system.

Hadoopfs-mkdir/tmp/Input

Hadoopfs-mkdir/tmp/Output

Hadoopfs-put input1.txt/tmp/input/

Hadoop FS-put input2.txt/tmp/input/

Execute the wordcount example. Note that the/tmp/output1 '/' must be added. The output directory/tmp/output1 cannot be created in advance.

Execution program:

Bin/hadoop jar ~ /Software/hadoop-0.20.2/hadoop-0.20.2-examples.jar wordcount/tmp/input/tmp/output1

View results:

(1) view HDFS in a browser

(2) command line: Bin/hadoop FS-CAT/tmp/output1/part-r-00000

2. HDFS cannot be accessed

Hadoop running status shut down directly, causing the hadoop data directory (/tmp/hadoop-username) to be deleted, re-start the computer, use the start-all.sh to start hadoop, run the JPS command (enter JPs directly in the terminal after the hadoop service is started) and the namenode is unavailable.

Solution: Delete the directory/tmp/hadoop-root/dfs/Name and format it again. Then run bin/stop-all.sh, bin/hadoop namenode
-Format Command Re-format.

Note: Use the command stop-all.sh to close hadoop before shutting down.

3. Click pseudo-distributed configuration

1. Install JDK and configure Environment Variables

Chmod + x jdk-6u24-linux-i586.bin

/Jdk-6u24-linux-i586.bin

Modify the file: sudo gedit/etc/profile

# Set Java environment

Export java_home = "/home/user/software/jdk1.6.0 _ 24"

Export classpath = "$ classpath: $ java_home/lib: $ java_home/JRE/lib"

Export Path = "$ java_home/bin: $ java_home/JRE/bin: $ path: $ homr/bin"

Umask 022

2. Install SSH

1) confirm the connection to the Internet and enter the command

Sudo apt-Get Install SSH

2) you can log on to the local machine without a password.

First, check whether the. Ssh folder exists under the U user (Note that there is "." In front of SSH, which is a hidden folder). Enter the following command:

1) ls-A/home/u

Generally, this hidden folder is automatically created under the current user during SSH installation. If there is no hidden folder, you can manually create one. U is the username of the current logon system.

Next, enter the command:

2) ssh-keygen-t dsa-p'-f ~ /. Ssh/id_dsa

Ssh-keygen indicates the generated key.-T (case sensitive) indicates the type of the generated key. DSA indicates the DSA key authentication, that is, the key type; -P is used to provide the secret language;-F specifies the generated key file. (I will not describe the knowledge about key cryptography in detail here. It will involve some knowledge about ssh. If you are interested, you can check the information on your own .)

In ubuntu ,~ Represents the current user folder, Which is/home/U.

This command will. create two files id_dsa and id_dsa.pub In the SSH folder. This is a pair of private keys and public keys for SSH. Similar to keys and locks, append id_dsa.pub (Public Key) to the authorized key.

Enter the following command:

3) cat ~ /. Ssh/id_dsa.pub> ~ /. Ssh/authorized_keys

This section adds the public key to the public key file used for authentication. The authorized_keys here is the public key file used for authentication.

At this point, no password is set to log on to the local machine.

4) Verify that SSH is successfully installed and that you can log on to the local machine without a password. (If you need to enter a password, you can modify passwordauthentication no in the/etc/ssh/sshd_config file)

Enter the following command:

1. Ssh-version

Display result:

Openssh_5.1p1 Debian-6ubuntu2, OpenSSL 0.9.8g 19oct 2007

Bad escapecharacter 'rsion '.

It indicates that SSH has been installed successfully.

Enter the following command:

2. Ssh localhost

The following information is displayed:

Theauthenticity of host 'localhost (: 1) 'can't be established.

RSA keyfingerprint is 8B: C3: 51: A5: 2a: 31: B7: 74: 06: 9d: 62: 04: 4f: 84: F8: 77.

Are you sureyou want to continue connecting (Yes/No )? Yes

Warning: permanentlyadded 'localhost' (RSA) to the list of known hosts.

Linux master2.6.31-14-generic # 48-ubuntu SMP Fri Oct 16 14:04:26 UTC 2009 i686

To accessofficial Ubuntu documentation, please visit:

Http://help.ubuntu.com/

Last login: Mon Oct 18 17:12:40 2010 from Master

Admin @ hadoop :~ $

This indicates that the installation is successful. When you log on for the first time, you will be asked if you want to continue the link. Enter yes to enter.

In fact, it does not matter whether to log on without a password during hadoop installation.

Configure login without a password. Every time hadoop is started, you need to enter a password to log on to the datanode of each machine. Considering that hadoop clusters usually have hundreds or thousands of machines, therefore, ssh password-less logon is usually configured.

PS-E | grep SSH

If you see sshd, it indicates that the ssh-server has been started.

If not, start sudo/etc/init. d/sshstart as follows:

The SSH-server configuration file is located in/etc/ssh/sshd_config. Here, you can define the SSH service port. The default port is 22. You can define other port numbers, such as 222.

Then restart the SSH service:

Sudo/etc/init. d/ssh stop

Sudo/etc/init. d/ssh start

3. Hadoop-env.sh:

Export java_home = Your JDK installation address // do not add double quotation marks

Specify the JDK installation location:

4. CONF/core-site.xml:

<Name> fs. Default. Name </Name>

<Value> HDFS :/// localhost: 9000 </value>

</Property>

</Configuration>

This is the core configuration file of hadoop. The address and port number of HDFS are configured here.

5. CONF/hdfs-site.xml:

<Name> DFS. Replication </Name>

</Property>

</Configuration>

This is the HDFS configuration in hadoop. The default backup mode is 3. In the standalone version of hadoop, you need to change it to 1.

6. CONF/mapred-site.xml:

<Name> mapred. Job. Tracker </Name>

<Value> localhost: 9001 </value>

</Property>

</Configuration>

This is the configuration file of mapreduce in hadoop, Which is configured with the address and port of jobtracker.

Note that if the version is installed earlier than version 0.20, there is only one configuration file, that is, the Hadoop-site.xml.

Next, you need to format the HDFS File System of hadoop before starting hadoop (this is the same as that of windows, and the volume after partitioning always needs to be formatted ). Enter the hadoop folder and enter the following command:

1. bin/hadoop namenode-format

Format the file system and start hadoop.

Enter the following command:

1. bin/start-all.sh (all started)

Finally, verify that hadoop is successfully installed.

Open your browser and enter the URL:

1. http: // localhost: 50030 (mapreduce web page)

2. http: // localhost: 50070 (HDFS web page)

If it can be viewed, it indicates that hadoop has been installed successfully.

For hadoop, installing mapreduce and HDFS is required, but if necessary, you can still just start HDFS (start-dfs.sh) or mapreduce (start-mapred.sh ).

Iv. Problems Encountered

(1) In the hadoop/bin directory, direct execution of hadoop, start-all.sh and other command failure. But in the hadoop directory through bin/hadoop, bin/start-all.sh mode is not invalid.

Solution:

Method 1: use commands to add Environment Variables

Export Path = "$ path:/home/user/software/hadoop-0.20.2/bin :";

Do not execute exportpath = "/home/user/software/hadoop-0.20.2/bin:"; this will overwrite the environment variables in the path.

Note: The environment variables in the/etc/profile file will be automatically added, so it is best to write the above path in this file,

As follows:

# Set Java environment

Export java_home = "/home/user/software/jdk1.6.0 _ 24"

Export classpath = "$ classpath: $ java_home/lib: $ java_home/JRE/lib"

Export Path = "$ java_home/bin: $ java_home/JRE/bin: $ path: $ homr/bin:/home/user/software/hadoop-0.20.2/bin :"

Umask 022

After modification, you can directly execute through hadoop, start-all.sh, stop-all.sh, hadoop-daemonstart-namenode and other commands.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More