Setting up Hadoop cluster environment steps under Ubuntu 12.04
I. Preparation before setting up the environment:
My native Ubuntu 12.04 32bit as Maser, is the same machine that was used in the stand-alone version of the Hadoop environment, http://www.linuxidc.com/Linux/2013-01/78112.htm
Also in the KVM Virtual 4 machines, respectively named:
Son-1 (Ubuntu 12.04 32bit),
Son-2 (Ubuntu 12.04 32bit),
Son-3 (CentOS 6.2 32bit),
Son-4 (RedHat 6.0 32bit).
To modify the host file on this machine,
sudo gedit/etc/hosts
Add the following to:
192.168.200.150 Master
192.168.200.151 son-1
192.168.200.152 son-2
192.168.200.153 son-3
192.168.200.154 son-4
Now let's start our journey of construction.
two. For native (master) and child nodes (son ...) Create Hadoop users and user groups, respectively,In fact, Ubuntu and CentOS under the creation of users still somewhat different.
Create under Ubuntu:
To create the Hadoop user group first:
sudo addgroup Hadoop
Then create a Hadoop User:
sudo adduser-ingroup Hadoop Hadoop
Created under CentOS and Redhat:
sudo adduser Hadoop
Note: Creating users directly under CentOS and Redhat will automatically generate related user groups and related files, while Ubuntu creates users directly and creates users without a home directory.
Add permissions to the Hadoop user to open the/etc/sudoers file;
sudo gedit/etc/sudoers
Pressing ENTER will open the/etc/sudoers file, giving the Hadoop user the same permissions as the root user.
Add the Hadoop all= (All:all) all under root all= (All:all) all,
Hadoop all= (All:all) all
three. For native (master) and child nodes (son ...) Install the JDK environment.
Ubuntu Next command:
sudo apt-get install Openjdk-6-jre
CentOS and Redhat recommend downloading source installation.
Four. Modify native (master) and child nodes (son ...) Machine name
Open/etc/hostname file;
sudo gedit/etc/hostname
Modified separately as: Master son-1 son-2 son-3 son-4. This is conducive to management and memory!
five. Native (master) and child nodes (son.) Install SSH service
Mainly for the Ubuntu installation, cents and redhat system with its own.
Ubuntu under:
sudo apt-get install SSH openssh-server
If you have already installed SSH, you can proceed to the sixth step OH
Six. First for the establishment of SSH password-free login environment
Before doing this step, we first recommend that all the machines be converted to Hadoop users in case of any interference with the permissions issue.
The switch commands are:
Su-hadoop
The SSH generation Key has RSA and DSA two ways of generation, by default, RSA approach.
1. Create the Ssh-key, here we adopt the RSA way;
Ssh-keygen-t rsa-p ""
(Note: Two files are generated under ~/.ssh/after a carriage return: Id_rsa and id_rsa.pub These two files appear in pairs)
2. Enter the ~/.ssh/directory, the Id_rsa.pub appended to the Authorized_keys authorization file, the beginning is not Authorized_keys file;
CD ~/.ssh
Cat Id_rsa.pub >> Authorized_keys
Seven. Installing Hadoop for native Mater
The version of Hadoop we use is: hadoop-0.20.203 (http://www.apache.org/dyn/closer.cgi/hadoop/common/) because the version is more stable.
1. Assuming hadoop-0.20.203.tar.gz is on the desktop, copy it to the installation directory/usr/local/;
sudo cp hadoop-0.20.203.0rc1.tar.gz/usr/local/
2. Decompression hadoop-0.20.203.tar.gz;
Cd/usr/local
sudo tar-zxf hadoop-0.20.203.0rc1.tar.gz
3. Rename the extracted folder to Hadoop;
sudo mv hadoop-0.20.203.0 Hadoop
4. Set the owner user of the Hadoop folder to Hadoop,
sudo chown-r hadoop:hadoop Hadoop
5. Open hadoop/conf/hadoop-env.sh file;
sudo gedit hadoop/conf/hadoop-env.sh
6. Configure conf/hadoop-env.sh (Find #export java_home= ..., remove #, then add the path of native JDK);
Export JAVA_HOME=/USR/LIB/JVM/JAVA-6-OPENJDK
7. Open Conf/core-site.xml file;
sudo gedit hadoop/conf/core-site.xml
Edited as follows:
<?xmlversion= "1.0"?>
<?xml-stylesheettype= "text/xsl" href= "configuration.xsl"?>
<!--put site-specific property overrides in this file. -->
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://master:9000</value>
</property>
</configuration>
8. Open Conf/mapred-site.xml file;
sudo gedit hadoop/conf/mapred-site.xml
Edited as follows:
<?xmlversion= "1.0"?>
<?xml-stylesheettype= "text/xsl" href= "configuration.xsl"?>
<!--put site-specific property overrides in this file. -->
<configuration>
<property>
<name>mapred.job.tracker</name>
<value>master:9001</value>
</property>
</configuration>
9. Open Conf/hdfs-site.xml file;
sudo gedit hadoop/conf/hdfs-site.xml
Edited as follows:
<configuration>
<property>
<name>dfs.name.dir</name>
<value>/usr/local/hadoop/datalog1,/usr/local/hadoop/datalog2</value>
</property>
<property>
<name>dfs.data.dir</name>
<value>/usr/local/hadoop/data1,/usr/local/hadoop/data2</value>
</property>
<property>
<name>dfs.replication</name>
<value>2</value>
</property>
</configuration>
10. Open conf/masters file, add as Secondarynamenode host name, here need to fill in master on OK.
sudo gedit hadoop/conf/masters
11. Open the Conf/slaves file, add the host name as slave, one line.
sudo gedit hadoop/conf/slaves
This is filled in the following sections:
Son-1
Son-2
Son-3
Son-4
Eight. To copy file one by one on the master machine to the Datanode machine (son-1,son-2,son-3,son-4):(Here take son-1 as an example)
1. Replication of Public key
SCP ~/.ssh/id_rsa.pub hadoop@son-1:~/.ssh/
2. Copy of Hosts file
Scp/etc/hosts hadoop@son-1:/etc/hosts
Note: If this is not replicated here, copy the file to/home/hadoop below, that is:
/home/hadoophadoop@son-1: scp/etc/hosts
Then move it to the same path below the Datanode machine/etc/hosts.
3. Hadoop folder replication, where the configuration also copied over!
Scp-r/usr/local/hadoop hadoop@son-1:/usr/local
If you can't move, it's the same as the one above!
And to make the following modifications to the permissions of the directory for all the nodes Hadoop:
sudo chown-r hadoop:hadoop Hadoop
After all these things have been copied, the Datanode machine will append the copied public key to the collection Trust list:
operate on each child node in its own species.
Cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
There is also an important point, the child node Datanode machine to copy over the Hadoop inside the DATA1,DATA2 and logs deleted!
There are also Java environment variable addresses to modify CentOS nodes (son-3) and Redhat nodes (son-4),
Configure/usr/local/hadoop/conf/hadoop-env.sh for CentOS nodes (son-3) and Redhat nodes (son-4) (Find #export java_home= ..., remove #, Then add the path of the native JDK); This environment is different, configure yourself.
So the environment has been basically set up, now start testing.
Nine. After that, basically it's almost the same, first into Master's Hadoop directory.
Cd/usr/local/hadoop
First can do a load balance, I fear this plus will be a bit messy, but no this part does not affect the operation, want to understand the message to me!
Start Datanode and Tasktracker:
bin/start-dfs.sh
bin/hadoop-daemon.sh Start Datanode
bin/hadoop-daemon.sh Start Tasktracker
Start all services Direct one command:
bin/start-all.sh
Check to see if your datanode is starting.
JPs
When JPS is not working properly:
Resource/etc/profile
When connecting, you can view the connection on Namenode:
Bin/hadoop Dfsadmin-report
For details, see the following figure:
You can also enter the URL directly:
master:50070
For details, see the following figure:
Figure 1:
Figure 2:
Because the Readhat Java environment is still a bit of a problem, so did not start up, the other normal.
Keep in mind that most of the above actions use Hadoop users, or there will be a lot of permissions issues in between.
The construction of the whole environment was completed.
steps of setting up Hadoop cluster environment under REDHAT5
Pre-Preparation
Two Linux virtual machines (use Redhat5,ip for 192.168.1.210, 192.168.1.211, respectively)
JDK Environment (this article uses jdk1.6, many online configuration methods, omitted in this article)
Hadoop installation package (using Hadoop1.0.4 in this article)
Set goals
210 as host and Node machine, 211 as node machine.
Build Steps
1 Modify Hosts file
Increase in/etc/hosts:
Increase in/etc/hosts:
192.168.1.210 HADOOP1
192.168.1.211 HADOOP2
2 Implementation SSH no password landing
2.1 Host (master) password-free native login
Ssh-keygen-t Dsa-p '-F ~/.SSH/ID_DSA
Direct return, when completed, generates two files in ~/.ssh/: Id_dsa and Id_dsa.pub. These two are in pairs appearing, similar to keys and locks.
Then append the id_dsa.pub to the authorization key (currently there is no Authorized_key s file):
Cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys
Experiment:
SSH localhost hostname
Or to enter a password, generally this situation is due to the directory or file permissions issues, look at the system log, is indeed a permission issue,
. The Authorized_keys permission under SSH is 600, and its parent directory and grandparent directory should be 755
2.2 No password landing node machine (slave)
Execute on Slave:
Ssh-keygen-t Dsa-p '-F ~/.SSH/ID_DSA
Generates an. SSH directory.
Copy the Authorized_keys on master to Slave:
SCP Authorized_keys hadoop2:~/.ssh/
Experiment: Executing on Master
SSH HADOOP2
Implement no password login.
3 Configuring Hadoop
3.1 Copy Hadoop
Copy the hadoop-1.0.4.tar.gz to the Usr/local folder and unzip it.
Decompression command:
Tar? zxvf hadoop-1.0.4.tar.gz
3.2 View Cat/etc/hosts
192.168.1.210 HADOOP1
192.168.1.211 HADOOP2
3.3 Configuration conf/masters and Conf/slaves
Conf/masters:
192.168.1.210
Conf/slaves:
192.168.1.211
192.168.1.211
3.4 Configuration conf/hadoop-env.sh
Join
Export JAVA_HOME=/HOME/ELVIS/SOFTK1.7.0_17
3.5 Configuration Conf/core-site.xml
Join
<property>
<name>fs.default.name</name>
<value>hdfs://192.168.1.210:9000<alue>
</property>
3.6 Configure Conflinux server to build Hadoop cluster environment Fs-site.xml
Join
<property>
<name>dfs.http.address</name>
<value>192.168.1.210:50070<alue>
</property>
<property>
<name>dfs.name.dir</name>
<value>/usr/local/hadoop/namenode<alue>
</property>
<property>
<name>dfs.data.dir</name>
<value>/usr/local/hadoop/data<alue>
</property>
<property>
<name>dfs.replication</name>
<value>2<alue>
</property>
3.7 Configuration Conf/mapred-site.xml
Join
<property>
<name>mapred.job.tracker</name>
<value>192.168.1.50:8012<alue>
</property>
3.8 Establishing the relevant directory
/usr/local/hadoop///hadoop data and Namenode directory
Note Create only the Hadoop directory, and do not manually create the data and Namenode directories.
The other node machine also establishes the directory.
3.9 Copy Hadoop files to other node machines
Remote copy of Hadoop files to other nodes (so that the previous configuration is mapped to other nodes)
Command:
Scp-r hadoop-1.0.4 192.168.1.211:/usr/local/
3.10 Format Active master (192.168.201.11)
Command:
Bin/hadoop Namenode-format
3.11 Start the cluster./start-all.sh
Now the cluster is starting up, look, command:
Bin/hadoop Dfsadmin-report
2 Datanode, open the web and take a look
Browser input: 192.168.1.210:50070
After the end of the pack, cluster installation completed!
FAQ
1 bad connection to FS. Command aborted
Need to view the log, my log shows:
2013-06-09 15:56:39,790 ERROR Org.apache.hadoop.hdfs.server.namenode.NameNode:java.io.IOException:NameNode is not Formatted.
At Org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead (fsimage.java:330)
At Org.apache.hadoop.hdfs.server.namenode.FSDirectory.loadFSImage (fsdirectory.java:100)
At Org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initialize (fsnamesystem.java:388)
At Org.apache.hadoop.hdfs.server.namenode.fsnamesystem.<init> (fsnamesystem.java:362)
At Org.apache.hadoop.hdfs.server.namenode.NameNode.initialize (namenode.java:276)
At Org.apache.hadoop.hdfs.server.namenode.namenode.<init> (namenode.java:496)
At Org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode (namenode.java:1279)
At Org.apache.hadoop.hdfs.server.namenode.NameNode.main (namenode.java:1288)
NAMENODE is not formatted!!!
Solution:
The reason is that I manually built/usr/local/hadoop/data and/usr/local/hadoop/namenode, remove the two directories to reformat Namenode.
2 WARN org.apache.hadoop.hdfs.server.datanode.DataNode:Invalid directory in Dfs.data.dir:Incorrect permission for/usr /local/hadoop/data, Expected:rwxr-xr-x, while actual:rwxrwxrwx
Solution:
/usr/local/hadoop/directory permissions are too high, changed to chmod 755 can be.
3 Eclipse Plug-in Issues
Exception 1:2011-08-03 17:52:26,244 INFO Org.apache.hadoop.ipc.Server:IPC Server handler 6 on 9800, call Getlisting (/home/fish/t Mp20/mapred/system) from 192.168.2.101:2936:error:org.apache.hadoop.security.accesscontrolexception:permission Denied:user=drwho, Access=read_execute, inode= "System": ROOT:SUPERGROUP:RWX-WX-WX
Org.apache.hadoop.security.AccessControlException:Permission denied:user=drwho, Access=read_execute, inode= " System ": ROOT:SUPERGROUP:RWX-WX-WX
At Org.apache.hadoop.hdfs.server.namenode.PermissionChecker.check (permissionchecker.java:176)
At Org.apache.hadoop.hdfs.server.namenode.PermissionChecker.checkPermission (permissionchecker.java:111)
At Org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission (fsnamesystem.java:4514)
At Org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPathAccess (fsnamesystem.java:4474)
At Org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getListing (fsnamesystem.java:1989)
At Org.apache.hadoop.hdfs.server.namenode.NameNode.getListing (namenode.java:556)
At Sun.reflect.NativeMethodAccessorImpl.invoke0 (Native method)
At Sun.reflect.NativeMethodAccessorImpl.invoke (nativemethodaccessorimpl.java:39)
At Sun.reflect.DelegatingMethodAccessorImpl.invoke (delegatingmethodaccessorimpl.java:25)
At Java.lang.reflect.Method.invoke (method.java:597)
At Org.apache.hadoop.ipc.rpc$server.call (rpc.java:508)
At Org.apache.hadoop.ipc.server$handler$1.run (server.java:959)
At Org.apache.hadoop.ipc.server$handler$1.run (server.java:955)
At Java.security.AccessController.doPrivileged (Native method)
At Javax.security.auth.Subject.doAs (subject.java:396)
At Org.apache.hadoop.ipc.server$handler.run (server.java:953)
Solution: Add the following in the Hdfs-site.xml
<property>
<name>dfs.permissions</name>
<value>false<alue>
</property>
HDFs Common Commands
Create a folder
./hadoop FS? mkdir/usr/local/hadoop/godlike
Uploading files
./hadoop FS? put/copyfromlocal 1.txt/usr/local/hadoop/godlike
See what files are in a folder
./hadoop FS? ls/usr/local/hadoop/godlike
View File Contents
./hadoop FS? cat/text/tail/usr/local/hadoop/godlike/1.txt
deleting files
./hadoop FS? rm/usr/local/hadoop/godlike
Delete Folder
./hadoop FS? rmr/usr/local/hadoop/godlike