The Linux server builds Hadoop cluster environment Redhat5/ubuntu 12.04

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Setting up Hadoop cluster environment steps under Ubuntu 12.04

I. Preparation before setting up the environment:

My native Ubuntu 12.04 32bit as Maser, is the same machine that was used in the stand-alone version of the Hadoop environment, http://www.linuxidc.com/Linux/2013-01/78112.htm

Also in the KVM Virtual 4 machines, respectively named:

Son-1 (Ubuntu 12.04 32bit),
Son-2 (Ubuntu 12.04 32bit),
Son-3 (CentOS 6.2 32bit),
Son-4 (RedHat 6.0 32bit).

To modify the host file on this machine,

sudo gedit/etc/hosts

Add the following to:

192.168.200.150 Master

192.168.200.151 son-1

192.168.200.152 son-2

192.168.200.153 son-3

192.168.200.154 son-4

Now let's start our journey of construction.

two. For native (master) and child nodes (son ...) Create Hadoop users and user groups, respectively,In fact, Ubuntu and CentOS under the creation of users still somewhat different.

Create under Ubuntu:

To create the Hadoop user group first:

sudo addgroup Hadoop

Then create a Hadoop User:

sudo adduser-ingroup Hadoop Hadoop

Created under CentOS and Redhat:

sudo adduser Hadoop

Note: Creating users directly under CentOS and Redhat will automatically generate related user groups and related files, while Ubuntu creates users directly and creates users without a home directory.

Add permissions to the Hadoop user to open the/etc/sudoers file;

sudo gedit/etc/sudoers

Pressing ENTER will open the/etc/sudoers file, giving the Hadoop user the same permissions as the root user.

Add the Hadoop all= (All:all) all under root all= (All:all) all,

Hadoop all= (All:all) all

three. For native (master) and child nodes (son ...) Install the JDK environment.

Ubuntu Next command:

sudo apt-get install Openjdk-6-jre

CentOS and Redhat recommend downloading source installation.

Four. Modify native (master) and child nodes (son ...) Machine name

Open/etc/hostname file;

sudo gedit/etc/hostname

Modified separately as: Master son-1 son-2 son-3 son-4. This is conducive to management and memory!

five. Native (master) and child nodes (son.) Install SSH service

Mainly for the Ubuntu installation, cents and redhat system with its own.

Ubuntu under:
sudo apt-get install SSH openssh-server

If you have already installed SSH, you can proceed to the sixth step OH

Six. First for the establishment of SSH password-free login environment

Before doing this step, we first recommend that all the machines be converted to Hadoop users in case of any interference with the permissions issue.

The switch commands are:
Su-hadoop

The SSH generation Key has RSA and DSA two ways of generation, by default, RSA approach.

1. Create the Ssh-key, here we adopt the RSA way;
Ssh-keygen-t rsa-p ""

(Note: Two files are generated under ~/.ssh/after a carriage return: Id_rsa and id_rsa.pub These two files appear in pairs)

2. Enter the ~/.ssh/directory, the Id_rsa.pub appended to the Authorized_keys authorization file, the beginning is not Authorized_keys file;
CD ~/.ssh
Cat Id_rsa.pub >> Authorized_keys

Seven. Installing Hadoop for native Mater

The version of Hadoop we use is: hadoop-0.20.203 (http://www.apache.org/dyn/closer.cgi/hadoop/common/) because the version is more stable.

1. Assuming hadoop-0.20.203.tar.gz is on the desktop, copy it to the installation directory/usr/local/;
sudo cp hadoop-0.20.203.0rc1.tar.gz/usr/local/

2. Decompression hadoop-0.20.203.tar.gz;
Cd/usr/local
sudo tar-zxf hadoop-0.20.203.0rc1.tar.gz

3. Rename the extracted folder to Hadoop;
sudo mv hadoop-0.20.203.0 Hadoop

4. Set the owner user of the Hadoop folder to Hadoop,
sudo chown-r hadoop:hadoop Hadoop

5. Open hadoop/conf/hadoop-env.sh file;
sudo gedit hadoop/conf/hadoop-env.sh

6. Configure conf/hadoop-env.sh (Find #export java_home= ..., remove #, then add the path of native JDK);
Export JAVA_HOME=/USR/LIB/JVM/JAVA-6-OPENJDK

7. Open Conf/core-site.xml file;
sudo gedit hadoop/conf/core-site.xml

Edited as follows:

<?xmlversion= "1.0"?>
<?xml-stylesheettype= "text/xsl" href= "configuration.xsl"?>

<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://master:9000</value>
</property>
</configuration>

8. Open Conf/mapred-site.xml file;
sudo gedit hadoop/conf/mapred-site.xml

Edited as follows:

<?xmlversion= "1.0"?>
<?xml-stylesheettype= "text/xsl" href= "configuration.xsl"?>

<configuration>
<property>
<name>mapred.job.tracker</name>
<value>master:9001</value>
</property>
</configuration>

9. Open Conf/hdfs-site.xml file;
sudo gedit hadoop/conf/hdfs-site.xml

Edited as follows:

<configuration>
<property>
<name>dfs.name.dir</name>
<value>/usr/local/hadoop/datalog1,/usr/local/hadoop/datalog2</value>
</property>
<property>
<name>dfs.data.dir</name>
<value>/usr/local/hadoop/data1,/usr/local/hadoop/data2</value>
</property>
<property>
<name>dfs.replication</name>
<value>2</value>
</property>
</configuration>

10. Open conf/masters file, add as Secondarynamenode host name, here need to fill in master on OK.
sudo gedit hadoop/conf/masters

11. Open the Conf/slaves file, add the host name as slave, one line.
sudo gedit hadoop/conf/slaves

This is filled in the following sections:

Son-1
Son-2
Son-3
Son-4

Eight. To copy file one by one on the master machine to the Datanode machine (son-1,son-2,son-3,son-4):(Here take son-1 as an example)

1. Replication of Public key

SCP ~/.ssh/id_rsa.pub hadoop@son-1:~/.ssh/

2. Copy of Hosts file

Scp/etc/hosts hadoop@son-1:/etc/hosts

Note: If this is not replicated here, copy the file to/home/hadoop below, that is:

/home/hadoophadoop@son-1: scp/etc/hosts

Then move it to the same path below the Datanode machine/etc/hosts.

3. Hadoop folder replication, where the configuration also copied over!

Scp-r/usr/local/hadoop hadoop@son-1:/usr/local

If you can't move, it's the same as the one above!

And to make the following modifications to the permissions of the directory for all the nodes Hadoop:

sudo chown-r hadoop:hadoop Hadoop

After all these things have been copied, the Datanode machine will append the copied public key to the collection Trust list:

operate on each child node in its own species.

Cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys

There is also an important point, the child node Datanode machine to copy over the Hadoop inside the DATA1,DATA2 and logs deleted!

There are also Java environment variable addresses to modify CentOS nodes (son-3) and Redhat nodes (son-4),

Configure/usr/local/hadoop/conf/hadoop-env.sh for CentOS nodes (son-3) and Redhat nodes (son-4) (Find #export java_home= ..., remove #, Then add the path of the native JDK); This environment is different, configure yourself.

So the environment has been basically set up, now start testing.

Nine. After that, basically it's almost the same, first into Master's Hadoop directory.

Cd/usr/local/hadoop

First can do a load balance, I fear this plus will be a bit messy, but no this part does not affect the operation, want to understand the message to me!

Start Datanode and Tasktracker:

bin/start-dfs.sh
bin/hadoop-daemon.sh Start Datanode
bin/hadoop-daemon.sh Start Tasktracker

Start all services Direct one command:

bin/start-all.sh

Check to see if your datanode is starting.

JPs

When JPS is not working properly:

Resource/etc/profile

When connecting, you can view the connection on Namenode:

Bin/hadoop Dfsadmin-report

For details, see the following figure:

You can also enter the URL directly:

master:50070

For details, see the following figure:

Figure 1:

Figure 2:

Because the Readhat Java environment is still a bit of a problem, so did not start up, the other normal.

Keep in mind that most of the above actions use Hadoop users, or there will be a lot of permissions issues in between.

The construction of the whole environment was completed.

steps of setting up Hadoop cluster environment under REDHAT5

Pre-Preparation

Two Linux virtual machines (use Redhat5,ip for 192.168.1.210, 192.168.1.211, respectively)
JDK Environment (this article uses jdk1.6, many online configuration methods, omitted in this article)
Hadoop installation package (using Hadoop1.0.4 in this article)

Set goals

210 as host and Node machine, 211 as node machine.

Build Steps

1 Modify Hosts file

Increase in/etc/hosts:

Increase in/etc/hosts:

192.168.1.210 HADOOP1
192.168.1.211 HADOOP2

2 Implementation SSH no password landing

2.1 Host (master) password-free native login

Ssh-keygen-t Dsa-p '-F ~/.SSH/ID_DSA

Direct return, when completed, generates two files in ~/.ssh/: Id_dsa and Id_dsa.pub. These two are in pairs appearing, similar to keys and locks.

Then append the id_dsa.pub to the authorization key (currently there is no Authorized_key s file):

Cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys

Experiment:

SSH localhost hostname

Or to enter a password, generally this situation is due to the directory or file permissions issues, look at the system log, is indeed a permission issue,
. The Authorized_keys permission under SSH is 600, and its parent directory and grandparent directory should be 755

2.2 No password landing node machine (slave)

Execute on Slave:

Ssh-keygen-t Dsa-p '-F ~/.SSH/ID_DSA

Generates an. SSH directory.

Copy the Authorized_keys on master to Slave:

SCP Authorized_keys hadoop2:~/.ssh/

Experiment: Executing on Master

SSH HADOOP2

Implement no password login.

3 Configuring Hadoop

3.1 Copy Hadoop

Copy the hadoop-1.0.4.tar.gz to the Usr/local folder and unzip it.

Decompression command:

Tar? zxvf hadoop-1.0.4.tar.gz

3.2 View Cat/etc/hosts

192.168.1.210 HADOOP1
192.168.1.211 HADOOP2

3.3 Configuration conf/masters and Conf/slaves

Conf/masters:

192.168.1.210

Conf/slaves:

192.168.1.211
192.168.1.211

3.4 Configuration conf/hadoop-env.sh

Join

Export JAVA_HOME=/HOME/ELVIS/SOFTK1.7.0_17

3.5 Configuration Conf/core-site.xml

Join

<property>
<name>fs.default.name</name>
<value>hdfs://192.168.1.210:9000<alue>
</property>

3.6 Configure Conflinux server to build Hadoop cluster environment Fs-site.xml

Join

<property>
<name>dfs.http.address</name>
<value>192.168.1.210:50070<alue>
</property>
<property>
<name>dfs.name.dir</name>
<value>/usr/local/hadoop/namenode<alue>
</property>
<property>
<name>dfs.data.dir</name>
<value>/usr/local/hadoop/data<alue>
</property>
<property>
<name>dfs.replication</name>
<value>2<alue>
</property>

3.7 Configuration Conf/mapred-site.xml

Join

<property>
<name>mapred.job.tracker</name>
<value>192.168.1.50:8012<alue>
</property>

3.8 Establishing the relevant directory

/usr/local/hadoop///hadoop data and Namenode directory

Note Create only the Hadoop directory, and do not manually create the data and Namenode directories.

The other node machine also establishes the directory.

3.9 Copy Hadoop files to other node machines

Remote copy of Hadoop files to other nodes (so that the previous configuration is mapped to other nodes)

Command:

Scp-r hadoop-1.0.4 192.168.1.211:/usr/local/

3.10 Format Active master (192.168.201.11)

Command:

Bin/hadoop Namenode-format

3.11 Start the cluster./start-all.sh

Now the cluster is starting up, look, command:

Bin/hadoop Dfsadmin-report

2 Datanode, open the web and take a look

Browser input: 192.168.1.210:50070

After the end of the pack, cluster installation completed!

FAQ

1 bad connection to FS. Command aborted

Need to view the log, my log shows:
2013-06-09 15:56:39,790 ERROR Org.apache.hadoop.hdfs.server.namenode.NameNode:java.io.IOException:NameNode is not Formatted.
At Org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead (fsimage.java:330)
At Org.apache.hadoop.hdfs.server.namenode.FSDirectory.loadFSImage (fsdirectory.java:100)
At Org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initialize (fsnamesystem.java:388)
At Org.apache.hadoop.hdfs.server.namenode.fsnamesystem.<init> (fsnamesystem.java:362)
At Org.apache.hadoop.hdfs.server.namenode.NameNode.initialize (namenode.java:276)
At Org.apache.hadoop.hdfs.server.namenode.namenode.<init> (namenode.java:496)
At Org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode (namenode.java:1279)
At Org.apache.hadoop.hdfs.server.namenode.NameNode.main (namenode.java:1288)

NAMENODE is not formatted!!!

Solution:

The reason is that I manually built/usr/local/hadoop/data and/usr/local/hadoop/namenode, remove the two directories to reformat Namenode.

2 WARN org.apache.hadoop.hdfs.server.datanode.DataNode:Invalid directory in Dfs.data.dir:Incorrect permission for/usr /local/hadoop/data, Expected:rwxr-xr-x, while actual:rwxrwxrwx

Solution:

/usr/local/hadoop/directory permissions are too high, changed to chmod 755 can be.

3 Eclipse Plug-in Issues

Exception 1:2011-08-03 17:52:26,244 INFO Org.apache.hadoop.ipc.Server:IPC Server handler 6 on 9800, call Getlisting (/home/fish/t Mp20/mapred/system) from 192.168.2.101:2936:error:org.apache.hadoop.security.accesscontrolexception:permission Denied:user=drwho, Access=read_execute, inode= "System": ROOT:SUPERGROUP:RWX-WX-WX
Org.apache.hadoop.security.AccessControlException:Permission denied:user=drwho, Access=read_execute, inode= " System ": ROOT:SUPERGROUP:RWX-WX-WX
At Org.apache.hadoop.hdfs.server.namenode.PermissionChecker.check (permissionchecker.java:176)
At Org.apache.hadoop.hdfs.server.namenode.PermissionChecker.checkPermission (permissionchecker.java:111)
At Org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission (fsnamesystem.java:4514)
At Org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPathAccess (fsnamesystem.java:4474)
At Org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getListing (fsnamesystem.java:1989)
At Org.apache.hadoop.hdfs.server.namenode.NameNode.getListing (namenode.java:556)
At Sun.reflect.NativeMethodAccessorImpl.invoke0 (Native method)
At Sun.reflect.NativeMethodAccessorImpl.invoke (nativemethodaccessorimpl.java:39)
At Sun.reflect.DelegatingMethodAccessorImpl.invoke (delegatingmethodaccessorimpl.java:25)
At Java.lang.reflect.Method.invoke (method.java:597)
At Org.apache.hadoop.ipc.rpc$server.call (rpc.java:508)
At Org.apache.hadoop.ipc.server$handler$1.run (server.java:959)
At Org.apache.hadoop.ipc.server$handler$1.run (server.java:955)
At Java.security.AccessController.doPrivileged (Native method)
At Javax.security.auth.Subject.doAs (subject.java:396)
At Org.apache.hadoop.ipc.server$handler.run (server.java:953)

Solution: Add the following in the Hdfs-site.xml

<property>
<name>dfs.permissions</name>
<value>false<alue>
</property>

HDFs Common Commands

Create a folder

./hadoop FS? mkdir/usr/local/hadoop/godlike

Uploading files

./hadoop FS? put/copyfromlocal 1.txt/usr/local/hadoop/godlike

See what files are in a folder

./hadoop FS? ls/usr/local/hadoop/godlike

View File Contents

./hadoop FS? cat/text/tail/usr/local/hadoop/godlike/1.txt

deleting files

./hadoop FS? rm/usr/local/hadoop/godlike

Delete Folder

./hadoop FS? rmr/usr/local/hadoop/godlike

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

The Linux server builds Hadoop cluster environment Redhat5/ubuntu 12.04

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

The Linux server builds Hadoop cluster environment Redhat5/ubuntu 12.04

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support