Hadoop1.2.1 pseudo Distribution Mode installation tutorial

Last Update:2014-12-22 Source: Internet

Author: User

Keywords Name java value dfs

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Hardware environment

Hadoop build system environment: A Linux ubuntu-13.04-desktop-i386 system, both to do Namenode, and do datanode. (Ubuntu system is built on the hardware virtual machine)

Hadoop installation target version: Hadoop1.2.1

JDK installation version: jdk-7u40-linux-i586

Pig installation version: pig-0.11.1

Hardware virtual machine Erection Environment: IBM Tower Server x3500m3 mt:7380

Eclipse installation Version: ECLIPSE-STANDARD-KEPLER-SR1-LINUX-GTK

II. Software Environment Preparation

2.1 Hadoop

Hadoop http://www.aliyun.com/zixun/aggregation/16920.html ">release 1.2.1 (Stable) version, download link: http:// MIRROR.NEXCESS.NET/APACHE/HADOOP/COMMON/HADOOP-1.2.1/, select hadoop-1.2.1-bin.tar.gz file download.

2.2 Java

jdk1.7 version of Java, of course, can use 1.6, download link: http://www.oracle.com/technetwork/java/javase/downloads/ jdk7-downloads-1880260.html, choose Linux x86 jdk-7u40-linux-i586.tar.gz version download (because my Linux machine is 32-bit). If the Linux machine is 64, you must select 64-bit downloads, and different machines must be configured with different JDK versions.

2.3 Eclipse

Eclipse chooses Linux 32-bit downloads: https://www.eclipse.org/downloads/

Iii. Installation Steps

3.1 Adding a user specifically for Hadoop

Command Line Input:

sudo addgroup Hadoop

sudo adduser-ingroup Hadoop Hadoop

To set sudo permissions for a Hadoop user:

sudo vim/etc/sudoers

Add a line of Hadoop all= (All:all) all under root all= (All:all) all

Switch to Hadoop User: Su Hadoop

3.2 Create the directory and unzip the installation package

Create a Directory

sudo mkdir/home/hadoop/install

sudo mkdir/home/hadoop/software/hadoop/* This directory stores the Hadoop program file * *

sudo mkdir/home/hadoop/software/java/* This directory stores JDK program files. */

sudo mkdir/home/hadoop/software/eclipse/* This directory stores the Eclipse program files. */

Decompression Installation Compression Pack

sudo tar-xzvf '/home/master/download/jdk-7u40-linux-i586.tar.gz '-c/home/hadoop/software/java/

sudo tar-xzvf '/home/master/download/hadoop-1.2.1-bin.tar.gz '-c/home/hadoop/software/hadoop/

3.3 Configuring Hadoop

Configuring the Java Environment

Add Java_home,classpath environment variables. Use the sudo vi/etc/profile command to edit the profile file and add the following at the end of the file:

Hadoop_install=/home/hadoop/software/hadoop/hadoop-1.2.1/

Java_home=/home/hadoop/software/java/jdk1.7.0_40

Path= $JAVA _home/bin: $HADOOP _install/bin: $PATH

Classpath= $JAVA _home/lib

Export Java_home PATH CLASSPATH Hadoop_install

Then save the exit and use Source/etc/profile to make just the changes take effect immediately.

Then use the java–version command to see if the configuration is successful and the following information will appear if successful:

Java Version "1.7.0_40″

Java (TM) SE Runtime Environnement (build 1.7.0_40-b43)

Java HotSpot (TM) Client VM (build 24.0-b56, Mixed mode)

Configuring the SSH Environment

Use the following command to set up an SSH password-free connection:

Ssh-keygen

CP ~/.ssh/id_rsa.pub ~/.ssh/authorized_keys

SSH localhost

The last line of code is used for testing. The first run prompts you to continue, type Yes, enter, and if you do not require a password, it means success.

Configuring the Hadoop Environment

Through the cd/home/hadoop/software/hadoop/hadoop-1.2.1/conf into the Conf directory, see Haddoop-env.sh,core-site.xml,mapred-site.xml, Hdfs-site.xml these four files and slaves and masters files that need to be configured in full distribution mode.

1. Configure hadoop-env.sh: Find the line where the Java_home keyword is located, remove the front # number, and fill in the actual java_home address:

Export java_home=/home/hadoop/software/java/jdk1.7.0_40

2. Configure Core-site.xml: Open core-site.xml file with VI core-site.xml, and add the following contents to revisit tag:

<name>fs.default.name</name>

<value>hdfs://localhost:9000</value>

</propety>

<!-fs.default.name: Used to configure Namenode, specify the URL of the HDFs file system, through which we can access the contents of the file system, or the localhost to change the cost of the machine IP address; You must change the localhost to the actual namenode machine's IP address, or use default port 8020 If you do not write the port. –>

<name>hadoop.tmp.dir</name>

<value>/home/hadoop/tmp/hadoop_tmp</value>

</property>

<!–hadoop.tmp.dir:hadoop default temporary path, this is the best configuration, if the new node or other circumstances inexplicable datanode can not start, delete the TMP directory in this file. However, if this directory is removed from the Namenode machine, then the Namenode formatted command needs to be executed again. The directory must be manually created beforehand. –>

3. Configure Hdfs-site.xml: Add the following to the Revisit tab, and all directories that do not exist are created in advance:

<value>/home/hadoop/appdata/hadoopdata</value>

</property>

<!– configuration HDFs Storage directory, data storage directory, for datanode storage data –>

<value>/home/hadoop/appdata/hadoopname</value>

</property>

<!-is used to store Namenode file system metadata, including editing logs and file system images, and if you replace the address, you will need to reuse the Hadoop namenode–format command to format namenode–>

<name>dfs.replication</name>

</proerty>

<!-is used to set the number of redundant file system backups because there is only one node, all set to 1, and the default number of systems is 3–>

4. Configure Mapred-site.xml: Add the following content to the revisit tag:

<name>mapred.job.tracker</name>

<value>localhost:9001</value>

</property>

<!-This configuration is used to configure the Jobtracker node, localhost can also change the IP address of the machine; in real distribution mode, notice that the IP address of the actual Jobtracker machine is changed –>

Iv. start Hadoop

4.1 Test the Hadoop configuration for success

The following command indicates that the configuration is correct when we see the version of Hadoop.

Hadoop version

4.2 Format Namenode

Cd/home/hadoop/software/hadoop/hadoop-1.2.1/bin

./hadoop Namenode–format

4.3 Starting the Hadoop process

Cd/home/hadoop/software/hadoop/hadoop-1.2.1/bin

./start-all.sh

Using the Java JPS command to see if a process has started successfully, it is OK to start secondarynamenode,jobtracker,namenode,datanode,trasktracker five processes successfully.

If a process does not start successfully, it means that the entire cluster is not working properly, entering/home/hadoop/software/hadoop/hadoop-1.2.1/libexec/. The/logs/directory allows you to view the failed journal.

4.4 View Hadoop information from the browser

View Jobtracker Information:

You can access Hadoop from a browser on this computer or another machine, and enter the following URL: http://10.1.151.168:50030/jobtracker.jsp, where 10.1.151.168 is the IP address of the machine.

View Namenode Information:

http://10.1.151.168:50070/dfshealth.jsp

View Trasktracker Information:

http://10.1.151.168:50060/tasktracker.jsp

Error notes

Password:localhost:permission Denied,please Try Recycle

Most of this is not given to Hadoop users with sudo permissions. So open your/etc/sudoers plus the Hadoop all= (all:all) all.

Tasktracker does not start properly

By looking up the Tasktracker error log in logs, it was found that one of the warn was set to not writable the permissions of the temp/hadoop_tmp.mapred/local/file in the corresponding directory. The above issues were resolved by modifying permissions, as follows:

sudo chmod 777/home/hadoop/temp/hadoop_tmp.mapred/local/

Each boot needs to be/etc/profile again, otherwise it will show no JDK

The problem is still unresolved because there is no reason to be found. How to do, forget, every time a little bit of source once again, for the moment.

Safemode:on–hdfs unavailable, causing nodes to display as 0, without namenode boot.

The query is a problem with the directory where the value of the Dfs.name.dir in the Hdfs-site.xml configuration is located, which shows: XXX is in a inconsistent state:storage directory does not exist Or is not accessible. where xxx represents that directory, constant restart and formatting can never solve the problem, delete this directory is also useless. Yes, I'm crazy, do you see my crazy eyes? Finally, suddenly thought of the role of chown, so I carried out the following instructions:

sudo chown-r hadoop:hadoop/home/hadoop/appdata/

Reformat, then start-all.sh, it's done! Summary of permissions issues for the file.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More