The Hadoop installation tutorial on Ubuntu

Source: Internet
Author: User

Install Hadoop 2.2.0 on Ubuntu Linux 13.04 (Single-node Cluster)This tutorial explains what to install Hadoop 2.2.0/2.3.0/2.4.0/2.4.1 on Ubuntu 13.04/13.10/14.04 (Single-node Cluster) . This is setup does not require a additional user for Hadoop. All files related to Hadoop would be stored inside the~/hadoopdirectory.

  • Install a JRE. If you want the Oracle JRE, follow this post.

  • Install SSH:sudo apt-get install openssh-serverGenerate a SSH key:ssh-keygen -t rsa -P ""Enable SSH Key:cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys(Optional) Disable SSH login from remote addresses to setting in/etc/ssh/sshd_config:ListenAddress 127.0.0.1Test Local Connection:ssh localhostIf Ok, then exit:exitOtherwise Debug

  • Download Hadoop 2.2.0 (or newer versions)

  • Unpack, rename and move to the home Directory:tar xvf hadoop-2.2.0.tar.gz mv hadoop-2.2.0 ~/hadoop
  • Create HDFS directory:mkdir-p ~/hadoop/data/namenode mkdir-p ~/hadoop/data/datanode
  • in file  ~/hadoop/etc/hadoop/hadoop-env.sh  insert (after the comment  "the Java Implementation to use. "): export java_home=" ' DirName $ (Readlink/etc/alternatives/java) '/. /"Export hadoop_common_lib_native_dir=" ~/hadoop/lib "Export hadoop_opts=" $HADOOP _opts-djava.library.path=~/hadoop /lib "
  • in file  ~/hadoop/etc/hadoop/core-site.xml   (inside  <configuration>  tag): <property> <name>fs.default.name</name> <value>hdfs ://localhost:9000</value></property>
  • in file  ~/hadoop/etc/hadoop/hdfs-site.xml   (inside  <configuration>  tag): <property> <name>dfs.replication</name> <value>1 </value></property><property> <name>dfs.namenode.name.dir</name> <value>${ User.home}/hadoop/data/namenode</value></property><property> <name> Dfs.datanode.data.dir</name> <value>${user.home}/hadoop/data/datanode</value></property
  • In file ~/hadoop/etc/hadoop/yarn-site.xml (inside <configuration> tag):<property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value></property><property> <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name> <value>org.apache.hadoop.mapred.ShuffleHandler</value></property>
  • Create file~/hadoop/etc/hadoop/mapred-site.xml:cp ~/hadoop/etc/hadoop/mapred-site.xml.template ~/hadoop/etc/hadoop/mapred-site.xmland insert (inside<configuration>TAG):<property> <name>mapreduce.framework.name</name> <value>yarn</value></property>
  • Add Hadoop binaries to PATH:echo "export PATH=$PATH:~/hadoop/bin:~/hadoop/sbin" >> ~/.bashrcsource ~/.bashrc
  • Format HDFS:hdfs namenode -format
  • Start Hadoop:start-dfs.sh && start-yarn.shIf you get the warning:
    WARN util. nativecodeloader:unable to load Native-hadoop library for your platform ... using Builtin-java classes where applicable
    It is because running on 64bit the Hadoop native library is 32bit. This is not aBigIssue. If you want (optional) to the fix it, check this.

  • Check status:jps expected output (PIDs may change!): 10969 DataNode11745 NodeManager11292 SecondaryNameNode10708 NameNode11483 ResourceManager13096 Jps n.b. The old jobtracker have been replaced by the ResourceManager.

  • Access Web interfaces:
    • Cluster status:http://localhost:8088
    • HDFS status:http://localhost:50070
    • Secondary NameNode status:http://localhost:50090

  • Test Hadoop:hadoop jar ~/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-2.2.0-tests.jar TestDFSIO -write -nrFiles 20 -fileSize 10Check the results and remove files:hadoop jar ~/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-2.2.0-tests.jar TestDFSIO -cleanAnd:hadoop jar ~/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.2.0.jar pi 2 5
  • Stop Hadoop:stop-dfs.sh && stop-yarn.sh

Some of these steps is taken from thisTutorial.

The Hadoop installation tutorial on Ubuntu

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.