Install Hadoop 2.2.0 on Ubuntu Linux 13.04 (Single-node Cluster)This tutorial explains what to install Hadoop 2.2.0/2.3.0/2.4.0/2.4.1 on Ubuntu 13.04/13.10/14.04 (Single-node Cluster) . This is setup does not require a additional user for Hadoop. All files related to Hadoop would be stored inside the~/hadoopdirectory.
- Install a JRE. If you want the Oracle JRE, follow this post.
- Install SSH:
sudo apt-get install openssh-server
Generate a SSH key:ssh-keygen -t rsa -P ""
Enable SSH Key:cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys
(Optional) Disable SSH login from remote addresses to setting in/etc/ssh/sshd_config:ListenAddress 127.0.0.1
Test Local Connection:ssh localhost
If Ok, then exit:exit
Otherwise Debug
- Download Hadoop 2.2.0 (or newer versions)
- Unpack, rename and move to the home Directory:
tar xvf hadoop-2.2.0.tar.gz
mv hadoop-2.2.0 ~/hadoop
- Create HDFS directory:
mkdir-p ~/hadoop/data/namenode
mkdir-p ~/hadoop/data/datanode
- in file ~/hadoop/etc/hadoop/hadoop-env.sh insert (after the comment "the Java Implementation to use. "):
export java_home=" ' DirName $ (Readlink/etc/alternatives/java) '/. /"Export hadoop_common_lib_native_dir=" ~/hadoop/lib "Export hadoop_opts=" $HADOOP _opts-djava.library.path=~/hadoop /lib "
- in file ~/hadoop/etc/hadoop/core-site.xml (inside <configuration> tag):
<property> <name>fs.default.name</name> <value>hdfs ://localhost:9000</value></property>
- in file ~/hadoop/etc/hadoop/hdfs-site.xml (inside <configuration> tag):
<property> <name>dfs.replication</name> <value>1 </value></property><property> <name>dfs.namenode.name.dir</name> <value>${ User.home}/hadoop/data/namenode</value></property><property> <name> Dfs.datanode.data.dir</name> <value>${user.home}/hadoop/data/datanode</value></property
- In file ~/hadoop/etc/hadoop/yarn-site.xml (inside <configuration> tag):
<property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value></property><property> <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name> <value>org.apache.hadoop.mapred.ShuffleHandler</value></property>
- Create file~/hadoop/etc/hadoop/mapred-site.xml:
cp ~/hadoop/etc/hadoop/mapred-site.xml.template ~/hadoop/etc/hadoop/mapred-site.xml
and insert (inside<configuration>TAG):<property> <name>mapreduce.framework.name</name> <value>yarn</value></property>
- Add Hadoop binaries to PATH:
echo "export PATH=$PATH:~/hadoop/bin:~/hadoop/sbin" >> ~/.bashrc
source ~/.bashrc
- Format HDFS:
hdfs namenode -format
- Start Hadoop:
start-dfs.sh && start-yarn.sh
If you get the warning:
WARN util. nativecodeloader:unable to load Native-hadoop library for your platform ... using Builtin-java classes where applicable
It is because running on 64bit the Hadoop native library is 32bit. This is not aBigIssue. If you want (optional) to the fix it, check this.
- Check status:
jps
expected output (PIDs may change!): 10969 DataNode11745 NodeManager11292 SecondaryNameNode10708 NameNode11483 ResourceManager13096 Jps
n.b. The old jobtracker have been replaced by the ResourceManager.
- Access Web interfaces:
- Cluster status:http://localhost:8088
- HDFS status:http://localhost:50070
- Secondary NameNode status:http://localhost:50090
- Test Hadoop:
hadoop jar ~/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-2.2.0-tests.jar TestDFSIO -write -nrFiles 20 -fileSize 10
Check the results and remove files:hadoop jar ~/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-2.2.0-tests.jar TestDFSIO -clean
And:hadoop jar ~/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.2.0.jar pi 2 5
- Stop Hadoop:
stop-dfs.sh && stop-yarn.sh
Some of these steps is taken from thisTutorial.
The Hadoop installation tutorial on Ubuntu