Steps for installing hadoop in linux

Source: Internet
Author: User
Tags hadoop fs

 

The following installation manual was created in the first version of hadoop, which is not consistent with the current version of hadoop.

 

I. preparations:

Download the hadoop: http://hadoop.apache.org/core/releases.html

Http://hadoop.apache.org/common/releases.html

Http://www.apache.org/dyn/closer.cgi/hadoop/core/

Http://labs.xiaonei.com/apache-mirror/hadoop/core/hadoop-0.20.1/hadoop-0.20.1.tar.gz

Http://labs.xiaonei.com/apache-mirror/hadoop/

Ii. hardware environment

There are three machines in total, all of which use CentOS, and Java uses jdk1.6.0.

 

 

3. install Java 6

Sudo apt-get install sun-java6-jdk

 

 

/Etc/environment

After opening, add: # The Center is separated by an English colon. Remember to use the English semicolon as the Separator in windows.

CLASSPATH =.:/usr/local/java/lib

JAVA_HOME =/usr/local/java

 

 

 

 

3. Configure the host table

[Root @ hadoop ~] # Vi/etc/hosts

127.0.0.1 localhost

192.168.13.100 namenode

192.168.13.108 datanode1

192.168.13.110 datanode2

 

 

[Root @ test ~] # Vi/etc/hosts

127.0.0.1 localhost

192.168.13.100 namenode

192.168.13.108 datanode1

 

 

[Root @ test2 ~] # Vi/etc/host

127.0.0.1 localhost

192.168.13.100 namenode

192.168.13.110 datanode2

Add users and user groups

Addgroup hadoop

Adduser hadoop

Usermod-a-G hadoop

Passwd hadoop

 

 

Configure ssh:

 

 

Server:

Su hadoop

Ssh-keygen-t rsa

Cp id_rsa.pub authorized_keys

 

 

Client

Chmod 700/home/hadoop

Chmod 755/home/hadoop/. ssh

Su hadoop

Cd/home

Mkdir. ssh

 

 

Server:

Chmod 644/home/hadoop/. ssh/authorized_keys

Scp authorized_keys datanode1:/home/hadoop/. ssh/

Scp authorized_keys datanode2:/home/hadoop/. ssh/

 

 

Ssh datanode1

Ssh datanode2

 

 

If ssh is configured, the following message is displayed:

The authenticity of host [dbrg-2] can't be established.

Key fingerpr is 1024 5f: a0: 0b: 65: d3: 82: df: AB: 44: 62: 6d: 98: 9c: fe: e9: 52.

Are you sure you want to continue connecting (yes/no )?

OpenSSH tells you that it does not know this host, but you don't have to worry about this problem. you log on to this host for the first time and type "yes ".

This host "recognition mark" is added to "~ /. Ssh/know_hosts "file will not display this prompt message for 2nd accesses to this host

 

 

 

 

But don't forget to test the local ssh dbrg-1

 

 

 

 

 

Mkdir/home/hadoop/HadoopInstall

Tar-zxvf hadoop-0.20.1.tar.gz-C/home/hadoop/HadoopInstall/

Cd/home/hadoop/HadoopInstall/

Ln-s hadoop-0.20.1 hadoop

 

 

Export JAVA_HOME =/usr/local/java

Export CLASSPATH =.:/usr/local/java/lib

Export HADOOP_HOME =/home/hadoop/HadoopInstall/hadoop

Export HADOOP_CONF_DIR =/home/hadoop-conf

Export PATH = $ HADOOP_HOME/bin: $ PATH

 

 

Cd $ HADOOP_HOME/conf/

Mkdir/home/hadoop-conf

Cp hadoop-env.sh core-site.xml hdfs-site.xml mapred-site.xml masters slaves/home/hadoop-conf

 

 

Vi $ HADOOP_HOME/hadoop-conf/hadoop-env.sh

 

 

 

 

# The java implementation to use. Required. -- change it to The jdk installation directory.

Export JAVA_HOME =/usr/local/java

 

Export HADOOP_CLASSPATH =.:/usr/local/java/lib

# The maximum amount of heap to use, in MB. Default is 1000. -- adjust according to your memory size

Export HADOOP_HEAPSIZE = 200

 

 

Vi/home/hadoop/. bashrc

Export JAVA_HOME =/usr/local/java

Export CLASSPATH =.:/usr/local/java/lib

Export HADOOP_HOME =/home/hadoop/HadoopInstall/hadoop

Export HADOOP_CONF_DIR =/home/hadoop-conf

Export PATH = $ HADOOP_HOME/bin: $ PATH

 

 

 

 

 

 

Configuration

 

 

Namenode

 

 

# Vi $ HADOOP_CONF_DIR/slaves

192.168.13.108

192.168.13.110

 

 

# Vi $ HADOOP_CONF_DIR/core-site.xml

<? Xml version = "1.0"?>

<? Xml-stylesheet type = "text/xsl" href = "configuration. xsl"?>

 

<! -- Put site-specific property overrides in this file. -->

 

<Configuration>

<Property>

<Name> fs. default. name </name>

<Value> hdfs: // 192.168.13.100: 9000 </value>

</Property>

</Configuration>

 

 

# Vi $ HADOOP_CONF_DIR/hdfs-site.xml

<? Xml version = "1.0"?>

<? Xml-stylesheet type = "text/xsl" href = "configuration. xsl"?>

 

<! -- Put site-specific property overrides in this file. -->

 

<Configuration>

<Property>

<Name> dfs. replication </name>

<Value> 3 </value>

<Description> Default block replication.

The actual number of replications can be specified when the file is created.

The default is used if replication is not specified in create time.

</Description>

</Property>

</Configuration>

 

 

 

 

# Vi $ HADOOP_CONF_DIR/mapred-site.xml

 

 

<? Xml version = "1.0"?>

<? Xml-stylesheet type = "text/xsl" href = "configuration. xsl"?>

 

<! -- Put site-specific property overrides in this file. -->

 

<Configuration>

<Property>

<Name> mapred. job. tracker </name>

<Value> 192.168.13.100: 11000 </value>

</Property>

</Configuration>

~

 

 

 

 

 

 

 

 

The configuration file on slave is as follows (hdfs-site.xml does not need to be configured ):

# Cat core-site.xml [root @ test12 conf] #

<? Xml version = "1.0"?>

<? Xml-stylesheet type = "text/xsl" href = "configuration. xsl"?>

 

 

<! -- Put site-specific property overrides in this file. -->

 

 

<Configuration>

<Property>

<Name> fs. default. name </name>

<Value> hdfs :/// namenode: 9000 </value>

</Property>

</Configuration>

 

 

# Cat mapred-site.xml [root @ test12 conf] #

<? Xml version = "1.0"?>

<? Xml-stylesheet type = "text/xsl" href = "configuration. xsl"?>

 

 

<! -- Put site-specific property overrides in this file. -->

 

 

<Configuration>

<Property>

<Name> mapred. job. tracker </name>

<Value> namenode: 11000 </value>

</Property>

</Configuration>

 

 

 

 

 

 

Start

Export PATH = $ HADOOP_HOME/bin: $ PATH

 

 

Hadoop namenode-format

Start-all.sh

Stop stop-all.sh

 

 

Create the danchentest folder on hdfs and upload the file to this directory.

$ HADOOP_HOME/bin/hadoop fs-mkdir danchentest

$ HADOOP_HOME/bin/hadoop fs-put $ HADOOP_HOME/README.txt danchentest

 

 

Cd $ HADOOP_HOME

Hadoop jar hadoop-0.20.1-examples.jar wordcount/user/hadoop/danchen test/README.txt output1

09/12/21 18:31:44 INFO input. FileInputFormat: Total input paths to process: 1

09/12/21 18:31:45 INFO mapred. JobClient: Running job: job_2009122111__0002

09/12/21 18:31:46 INFO mapred. JobClient: map 0% reduce 0%

09/12/21 18:31:53 INFO mapred. JobClient: map 100% reduce 0%

09/12/21 18:32:05 INFO mapred. JobClient: map 100% reduce 100%

09/12/21 18:32:07 INFO mapred. JobClient: Job complete: job_2009122111__0002

09/12/21 18:32:07 INFO mapred. JobClient: Counters: 17

09/12/21 18:32:07 INFO mapred. JobClient: Job Counters

09/12/21 18:32:07 INFO mapred. JobClient: Launched reduce tasks = 1

 

 

View the output result file on hdfs.

[Root @ test11 hadoop] # hadoop fs-ls output1

Found 2 items

Drwxr-xr-x-root supergroup 0/user/root/output1/_ logs

-Rw-r -- 3 root supergroup 1306/user/root/output1/part-r-00000

 

 

[Root @ test11 hadoop] # hadoop fs-cat output1/part-r-00000

(BIS), 1

(ECCN) 1

 

 

View the running status of hdfs. You can access http: // 192.168.13.100: 50070/dfshealth. jsp through the web interface; view map-reduce information,

You can access http: // 192.168.13.100: 50030/jobtracker. jsp through the web interface. The following figure shows the result through the command line.

 

 

 

 

08/01/25 16:31:40 INFO ipc. Client: Retrying connect to server: foo.bar.com/1.1.1.53567. Already tried 1 time (s ).

Is not formatted: hadoop namenode-format

 

This article is from the "one party" blog

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.