Steps for installing hadoop in linux

Last Update:2014-06-16 Source: Internet

Author: User

Tags hadoop fs

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

The following Installation Manual is what I did in the first version of hadoop, and it is not in line with the current hadoop 1. Preparation: Download hadoop: http://hadoop.apache.org/core/releases.htmlhttp://hadoop.apache.org/common/releases.htmlhttp://www.apache.o

The following Installation Manual was created in the first version of hadoop, which is not consistent with the current version of hadoop.

I. preparations:

Download the hadoop: http://hadoop.apache.org/core/releases.html

Http://hadoop.apache.org/common/releases.html

Http://www.apache.org/dyn/closer.cgi/hadoop/core/

Http://labs.xiaonei.com/apache-mirror/hadoop/core/hadoop-0.20.1/hadoop-0.20.1.tar.gz

Http://labs.xiaonei.com/apache-mirror/hadoop/

II. hardware environment

There are three machines in total, all of which use CentOS, and Java uses jdk1.6.0.

3. Install Java 6

Sudo apt-get install sun-java6-jdk

/Etc/environment

After opening, add: # The Center is separated by an English colon. remember to use the English semicolon as the separator in windows.

CLASSPATH =.:/usr/local/java/lib

JAVA_HOME =/usr/local/java

3. configure the host table

[Root @ hadoop ~] # Vi/etc/hosts

127.0.0.1 localhost

192.168.13.100 namenode

192.168.13.108 datanode1

192.168.13.110 datanode2

[Root @ test ~] # Vi/etc/hosts

127.0.0.1 localhost

192.168.13.100 namenode

192.168.13.108 datanode1

[Root @ test2 ~] # Vi/etc/host

127.0.0.1 localhost

192.168.13.100 namenode

192.168.13.110 datanode2

Add users and user groups

Addgroup hadoop

Adduser hadoop

Usermod-a-G hadoop

Passwd hadoop

Configure ssh:

Server:

Su hadoop

Ssh-keygen-t rsa

Cp id_rsa.pub authorized_keys

Client

Chmod 700/home/hadoop

Chmod 755/home/hadoop/. ssh

Su hadoop

Cd/home

Mkdir. ssh

Server:

Chmod 644/home/hadoop/. ssh/authorized_keys

Scp authorized_keys datanode1:/home/hadoop/. ssh/

Scp authorized_keys datanode2:/home/hadoop/. ssh/

Ssh datanode1

Ssh datanode2

If ssh is configured, the following message is displayed:

The authenticity of host [dbrg-2] can't be established.

Key fingerpr is 1024 5f: a0: 0b: 65: d3: 82: df: AB: 44: 62: 6d: 98: 9c: fe: e9: 52.

Are you sure you want to continue connecting (yes/no )?

OpenSSH tells you that it does not know this host, but you don't have to worry about this problem. you log on to this host for the first time and type "yes ".

This host "recognition mark" is added to "~ /. Ssh/know_hosts "file will not display this prompt message for 2nd accesses to this host

But don't forget to test the local ssh dbrg-1

Mkdir/home/hadoop/HadoopInstall

Tar-zxvf hadoop-0.20.1.tar.gz-C/home/hadoop/HadoopInstall/

Cd/home/hadoop/HadoopInstall/

Ln-s hadoop-0.20.1 hadoop

Export JAVA_HOME =/usr/local/java

Export CLASSPATH =.:/usr/local/java/lib

Export HADOOP_HOME =/home/hadoop/HadoopInstall/hadoop

Export HADOOP_CONF_DIR =/home/hadoop-conf

Export PATH = $ HADOOP_HOME/bin: $ PATH

Cd $ HADOOP_HOME/conf/

Mkdir/home/hadoop-conf

Cp hadoop-env.sh core-site.xml hdfs-site.xml mapred-site.xml masters slaves/home/hadoop-conf

Vi $ HADOOP_HOME/hadoop-conf/hadoop-env.sh

# The java implementation to use. Required. -- change it to The jdk installation directory.

Export JAVA_HOME =/usr/local/java

Export HADOOP_CLASSPATH =.:/usr/local/java/lib

# The maximum amount of heap to use, in MB. Default is 1000. -- adjust according to your memory size

Export HADOOP_HEAPSIZE = 200

Vi/home/hadoop/. bashrc

Export JAVA_HOME =/usr/local/java

Export CLASSPATH =.:/usr/local/java/lib

Export HADOOP_HOME =/home/hadoop/HadoopInstall/hadoop

Export HADOOP_CONF_DIR =/home/hadoop-conf

Export PATH = $ HADOOP_HOME/bin: $ PATH

Configuration

Namenode

# Vi $ HADOOP_CONF_DIR/slaves

192.168.13.108

192.168.13.110

# Vi $ HADOOP_CONF_DIR/core-site.xml

Fs. default. name

Hdfs: // 192.168.13.100: 9000

# Vi $ HADOOP_CONF_DIR/hdfs-site.xml

Dfs. replication

Default block replication.

The actual number of replications can be specified when the file is created.

The default is used if replication is not specified in create time.

# Vi $ HADOOP_CONF_DIR/mapred-site.xml

Mapred. job. tracker

192.168.13.100: 11000

The configuration file on slave is as follows (hdfs-site.xml does not need to be configured ):

# Cat core-site.xml [root @ test12 conf] #

Fs. default. name

Hdfs: // namenode: 9000

# Cat mapred-site.xml [root @ test12 conf] #

Mapred. job. tracker

Namenode: 11000

Start

Export PATH = $ HADOOP_HOME/bin: $ PATH

Hadoop namenode-format

Start-all.sh

Stop stop-all.sh

Create the danchentest folder on hdfs and upload the file to this directory.

$ HADOOP_HOME/bin/hadoop fs-mkdir danchentest

$ HADOOP_HOME/bin/hadoop fs-put $ HADOOP_HOME/README.txt danchentest

Cd $ HADOOP_HOME

Hadoop jar hadoop-0.20.1-examples.jar wordcount/user/hadoop/Danchen test/README.txt output1

09/12/21 18:31:44 INFO input. FileInputFormat: Total input paths to process: 1

09/12/21 18:31:45 INFO mapred. JobClient: Running job: job_2009122111__0002

09/12/21 18:31:46 INFO mapred. JobClient: map 0% reduce 0%

09/12/21 18:31:53 INFO mapred. JobClient: map 100% reduce 0%

09/12/21 18:32:05 INFO mapred. JobClient: map 100% reduce 100%

09/12/21 18:32:07 INFO mapred. JobClient: Job complete: job_2009122111__0002

09/12/21 18:32:07 INFO mapred. JobClient: Counters: 17

09/12/21 18:32:07 INFO mapred. JobClient: Job Counters

09/12/21 18:32:07 INFO mapred. JobClient: Launched reduce tasks = 1

View the output result file on hdfs.

[Root @ test11 hadoop] # hadoop fs-ls output1

Found 2 items

Drwxr-xr-x-root supergroup 0/user/root/output1/_ logs

-Rw-r -- 3 root supergroup 1306/user/root/output1/part-r-00000

[Root @ test11 hadoop] # hadoop fs-cat output1/part-r-00000

(BIS), 1

(ECCN) 1

View the running status of hdfs. you can access http: // 192.168.13.100: 50070/dfshealth. jsp through the web interface; View map-reduce information,

You can access http: // 192.168.13.100: 50030/jobtracker. jsp through the web interface. The following figure shows the result through the command line.

08/01/25 16:31:40 INFO ipc. Client: Retrying connect to server: foo.bar.com/1.1.1.53567. Already tried 1 time (s ).

Is not formatted: hadoop namenode-format

This article is from the "one party" blog

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More