Hadoop Learning Note III: Distributed Hadoop deployment

Source: Internet
Author: User

Pre-language: If crossing is a comparison like the use of off-the-shelf software, it is recommended to use the Quickhadoop, this use of the official documents can be compared to the fool-style, here do not introduce. This article is focused on deploying distributed Hadoop for yourself.

1. Modify the machine name

[[email protected] root]# vi/etc/sysconfig/network

hostname=*** a column to the appropriate name, the author two machines using HOSTNAME=HADOOP00,HOSTNAME=HADOOP01 this way.

2. Modify IP, gateway, mask, etc.

Vim/etc/sysconfig/network-scripts/ifcgf-eth0#netmask netmask #ipaddr IP address #gateway default gateway IP address

According to the actual situation, configure the network information of the two machines.

3. Modify the Hosts file for easy discovery by name (can be considered as local DNS)

[[email protected] root] # vi/etc/hosts added at the end:192.168.1.112 Hadoop00192.168.1.113 Hadoop01#ip changed, because home, the gateway to the home is 192 168.1.1, so I changed the IP.

Every machine that needs to communicate with the machine name should be configured on the above, so generally speaking, still suggest to communicate with the IP, after all, in some special cases, the machine name may be in the case of failure.

  

4. Create a Hadoop user group, users (Hadoop users should also build a password)

[[email protected] root] # groupadd Hadoop useradd-g hadoop-g Hadoop Hadoop [[email protected] root] # passwd hadoopchanging password for user Hadoop.  New Password:bad Password:it is based on a dictionary wordretype new Password:passwd:all authentication tokens updated Successfully.

5.SSH login without password (requires SSH and rsync service)

Focus: Generate a public private key pair in Namenode and then send the public key to each Datanode
Therefore, the master node does the following:

[[email protected] root] # Su-hadoop [[email protected] Hadoop]$ ssh-keygen-t rsa-p "

Here, the default path generated by the key file is/home/hadoop/.ssh
Append the id_rsa.pub to the authorized key

[[email protected] Hadoop]$ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys

It is important to empower Authorized_keys and, at the same time, to modify the/etc/ssh/sshd_config with the root user

[[email protected] Hadoop]$ chmod ~/.ssh/authorized_keys

Back to root user

[[email protected] Hadoop]$ exit; [[email protected] Hadoop]# Vi/etc/ssh/sshd_config

Will
#RSAAuthentication Yes
#PubkeyAuthentication Yes
#AuthorizedKeysFile. Ssh/authorized_keys
The previous comment is removed and becomes
Rsaauthentication Yes
Pubkeyauthentication Yes
Authorizedkeysfile. Ssh/authorized_keys
Restart SSH

[[email protected] Hadoop]# service sshd restart

Verify this machine:

 [ [email  Protected] Root   # Su-hadoop  [ [email  protected] Hadoop   $ ssh localhostthe authenticity of host ' localhost ( 127.0.0.1) ' Can ' t be established. RSA key fingerprint is  82:5a:c0:ab:00:be:1d:ad:92:66:29:e9:cc:81:6d:2f. Is you sure want to continue connecting (yes/no)? yeswarning:permanently added ' localhost ' (RSA) to the list of known hosts.  [ [email protected] Hadoop   $ 

OK, this machine is no problem, then the public key to other Datanode machine

SCP ~/.ssh/id_rsa.pub [Email protected]:~/

The above command is to copy the file "Id_rsa.pub" to the server IP "192.168.1.113" user as "Hadoop" under "/home/hadoop/"
Since to this step still no password login function, it is still required to enter the password, with datanode hadoop password login can

[[email protected] Hadoop]$ SCP ~/.ssh/id_rsa.pub [email protected]: ~/[email protected ]192.168.1.113' s password:id_rsa.pub           100% |*************************************************** *********|   238       00:00    [[email protected] Hadoop]$

The following is the configuration of each datanode machine, if there are more than one datanode machine, here take 192.168.1.113 as an example

 [ [email  Protected] Hadoop    [ [ Email protected] Hadoop   $ chmod 700 ~/.ssh  [ [email  protected] Hadoop   $ cat ~/id_rsa.pub >> ~/.ssh/authorized_keys  [ [email protected] Hadoop  ]  $ chmod ~/.ssh/authorized_keys 

The/etc/ssh/sshd_config is still modified with the root user, referring to the previous action on Namenode, including restarting SSH

When you have finished configuring all Datanode, remember to delete the Id_rsa.pub file

Rm-r ~/id_rsa.pub

Now, from Namenode can login to each datanode without password, but Datanode login Namenode still need password, have time to do the above steps again, the Datanode also made no password

Login Namenode, practiced hand. The steps are: in Datanode:

Su-hadoop ssh-keygen-t rsa-p ' cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys SCP ~/.ssh/id_rsa.pub     [email Protected

In Namenode:

Cat ~/id_rsa.pub >> ~/.ssh/authorized_keys rm-r ~/id_rsa.pub

6.Hadoop cluster installation

Take Namenode as an example, the rest of the Datanode installation method.
After logging in as root user namenode, upload http://mirrors.cnnic.cn/apache/hadoop/common/hadoop-1.2.1/
Download the hadoop-1.2.1.tar.gz package to/home/hadoop;

CP/HOME/HADOOP/HADOOP-1.2.1.TAR.GZ/USR     #把 "hadoop-1.2.1.tar.gz" is copied to the "/usr" directory cd/usr    #进入"/usr" Catalog tar-zxvf hadoop-1.2.1.tar.gz    #解压 "hadoop-1.0.0.tar.gz" installation package MV hadoop-1.2.1 Hadoop    #将 " hadoop-1.0.0 "folder Rename" Hadoop "chown-r hadoop:hadoop hadoop    #将文件夹" Hadoop " Read permissions assigned to Hadoop users RM-RF hadoop- 1.2.1.tar.gz     #删除 "hadoop-1.0.0.tar.gz" installation package

Add a Hadoop environment variable to modify the/etc/profile file, add it at the end

# Set Hadoop path export hadoop_home=/usr/hadoop export path

Restart effective: Source/etc/profile (can also be referenced in the previous JDK installation.) /etc/profile)

755/usr/hadoop/data

7. Configure Hadoop

Modify the hadoop-env.sh file in the/usr/hadoop/conf directory and add it at the end:

#set java environmentexport java_home=/usr/java/jdk1.6.0_45

If you forget, you can see it through the Echo $JAVA _home first

Modify the Hadoop core configuration file Core-site.xml, which is configured with the address and port number of the HDFs:

<Configuration>    < Property>        <name>Hadoop.tmp.dir</name>        <value>/usr/hadoop/tmp</value>        <Description>A base for other temporary directories.</Description>    </ Property><!--File System Properties -    < Property>        <name>Fs.default.name</name>        <value>hdfs://192.168.1.112:9000</value>    </ Property></Configuration>

Modify the configuration of HDFs (Hdfs-site.xml) in Hadoop, and the configured backup method defaults to 3:

<Configuration>< Property>   <name>Dfs.data.dir</name>   <value>/usr/hadoop/data</value>  </ Property>      < Property>        <name>Dfs.replication</name>        <value>1</value>    </ Property><Configuration>

Modify the configuration file for MapReduce (Mapred-site.xml) in Hadoop and configure the Jobtracker address and port:

<Configuration>    < Property>        <name>Mapred.job.tracker</name>        <value>http://192.168.1.112:9001</value>    </ Property></Configuration>

To configure the Masters file:
VI Masters
Generally speaking, the machine name or IP can be used, but the IP is recommended.

[[email protected] conf]  192.168.1.112

Configuring the Slaves file (master host specific)
VI Slaves
Add the machines of the Datanode node in one line. Generally speaking, the machine name or IP can be used, but the IP is recommended.

[[email protected] conf] # VI Slaves192.168.1.113

Install the above steps on all Datanode machines again, slaver do not need to do. You can also directly copy the Namenode on the/usr/hadoop directly to the Datanode machine, and then do the configuration. Replication is used here:

Scp-r/usr/hadoop [Email protected]:/usr/

Root login Datanode machine, change its user group

Chown-r Hadoop:hadoop Hadoop

Adding Hadoop environment variables to individual Datanode
Add a Hadoop environment variable
Modify the/etc/profile file and add it at the end

# set Hadoop pathexport hadoop_home=/usr/hadoopexport PATH= $PATH: $HADOOP _home/bin

Restart takes effect
Source/etc/profile (You can also refer to the previous JDK installation.) /etc/profile)

8. Start validation

First, turn off all machine firewalls

Service Iptables Stop

Log in with a Hadoop user, format the DHFS file system

Hadoop Namenode-format (Note: Only once, the next boot no longer requires formatting, just start-all.sh)

With alarms:
Warning: $HADOOP _home is deprecated.
Can edit/etc/profile, add (all machines to be added)
Export Hadoop_home_warn_suppress=1
Restart takes effect.

There are still errors:
15/01/10 14:19:52 ERROR Namenode. NameNode:java.io.IOException:Cannot Create Directory/usr/hadoop/tmp/dfs/name/current
The original is TMP belongs to the root user group and changes the owning user group of TMP.
Start:
start-all.sh

Stop it:
stop-all.sh
Modify Datanode, version file in the following directory to change namespaceid=1505787769 to match Namenode
/usr/hadoop/tmp/dfs/name/current
or clear the Namenode TMP data, re-format:

CD ~rm-rf/usr/hadoop/tmpmkdir/usr/hadoop/tmprm-rf/tmp/hadoop*hadoop namenode-formatstart-all.sh

JPS Check whether the Datanode node is not up, see the log:
2015-01-10 14:59:49,121 WARN org.apache.hadoop.hdfs.server.datanode.DataNode:Invalid directory in Dfs.data.dir: Incorrect permission For/usr/hadoop/tmp/dfs/data, expected:rwxr-xr-x, while Actual:rwxrwxr-x
2015-01-10 14:59:49,121 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode:All directories in Dfs.data.dir is Inval
Id.
The original is a permissions issue:

chmod 755/usr/hadoop/tmp/dfs/data

Restart Hadoop

Hadoop Learning Note III: Distributed Hadoop deployment

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.