Installation and configuration of Hadoop under Ubuntu16.04 (pseudo-distributed environment)

Source: Internet
Author: User

Note: This article has reference to this article, but because of some errors, so in the actual operation encountered a lot of trouble, so wrote this article for everyone to use

First, prepare 1.1 to create a Hadoop user
sudo useradd-m hadoop-s/bin/bash  #创建hadoop用户, and use/bin/sudopasswd  Hadoop                   sudosudoSu-   hadoop                          sudo              Apt-get Update                  #更新hadoop用户的apt for easy installation at the rear

1.2 Install SSH, set SSH login without password
sudo Install openssh-server   ssh  localhost                         #登陆SSH, enter yes$ exit                                  ~/for the first time.  SSH/                            ssh-keygen -t RSA

After you have entered the $ ssh-keygen-t RSA statement, you need to enter the three consecutive times, such as:

Where the first return is to allow key to be stored in the default location to facilitate subsequent command input. The second and third is to determine the passphrase, the correlation is not small. Two times after the enter has been entered, if an output similar to the shown is present, the success is:

Then enter:

cat ./id_rsa.pub >>./ssh localhost                         #此时已不需密码即可登录localhost, and visible. If you fail, you can search for an answer by searching for SSH password-free login

Second, installation jdk1.7

First download jdk1.7 http://www.oracle.com/technetwork/java/javase/downloads/index.html on the Oracle website and then configure the installation and environment variables, According to the personal computer system to select the corresponding version, I chose jdk-7u80-linux-x64.tar.gz

mkdir /usr/lib/jvm                           sudotar zxvf jdk-7u80-linux-x64.  Tar. gz  -c/usr/lib #/extracted to/usr/lib//usr/lib/JVM                                 mv  jdk1.  7. 0_80 java                         vi ~/.BASHRC                                 #给JDK配置环境变量

Note: If you do not have sufficient permissions to create a JVM folder under the relevant directory, you can use the $ sudo-i statement to enter the root account to create the folder.

It is also recommended to use VIM to edit the environment variables, i.e. the last sentence using the instruction

If you do not have vim, you can use:

$sudoinstall vim

To download them.

Add the following command to the. bashrc file:

Export java_home=/usr/lib/jvm/javaexport jre_home=${java_home}/jreexport CLASSPATH=.:${java_ home}/lib:${jre_home}/libexport PATH=${java_home}/bin: $PATH

After the file has been modified, enter the code:

$ source ~/. BASHRC                       -version                          #检测是否安装成功, view Java version

If the content appears as shown, the installation is successful

Note: If you do not want a single tap, you can copy the paste, but because Vim does not support the system pasteboard, so you need to download the relevant plug-in Vim-gnome

sudo apt-get install Vim-gnome

Then copy the relevant code, the cursor moves to the specified location, using the instruction "+p, you can copy, note" also need to tap the content, that is, a total ", +, p three operators need to be typed

Third, installation hadoop-2.6.0

To download the hadoop-2.6.0.tar.gz first, the link is as follows:
http://mirrors.hust.edu.cn/apache/hadoop/common/

Install the following:

sudo tar -zxvf  hadoop-2.6.  0. tar. gz-c/usr/local    #解压到/usr//usr/sudomv  hadoop-  2.6. 0     Hadoop                      sudochown -R hadoop./hadoop                        #修改文件权限

To configure the environment variables for Hadoop, add the following code to the. bashrc file:

Export hadoop_home=/usr/local/hadoopexport CLASSPATH=$ ($HADOOP _home/bin/HADOOP CLASSPATH): $ Classpathexport hadoop_common_lib_native_dir= $HADOOP _home/lib/nativeexport PATH= $PATH: $HADOOP _home /bin: $HADOOP _home/sbin

Similarly, execute source ~./BASHRC to make the settings take effect and see if Hadoop is installed successfully

Four, pseudo-distributed configuration

Hadoop can run in a pseudo-distributed manner on a single node, and the Hadoop process runs as a separate Java process, with nodes as both NameNode and DataNode, while reading the files in HDFS. The configuration file for Hadoop is located in/usr/local/hadoop/etc/hadoop/, and pseudo-distributed requires the modification of 2 configuration files Core-site.xml and Hdfs-site.xml. The configuration file for Hadoop is in XML format, and each configuration is implemented in a way that declares the property's name and value. First add the jdk1.7 path (export Java_home=/usr/lib/jvm/java) to the hadoop-env.sh file

Next, modify the Core-site.xml file:

<configuration>        <property>             <name>hadoop.tmp.  Dir</name>             <value>file:/usr/local/hadoop/tmp</value>              for other Temporary directories.</description>        </property>        <property>             <name> Fs.defaultfs</name>             <value>hdfs://localhost:9000</value>        </property></configuration>

Next, modify the configuration file Hdfs-site.xml

<configuration>        <property>             <name>dfs.replication</name>             <value>1 </value>        </property>        <property>             <name>dfs.namenode.name.  Dir</name>             <value>file:/usr/local/hadoop/tmp/dfs/name</value>        </ property>        <property>             <name>dfs.datanode.data.  Dir</name>             <value>file:/usr/local/hadoop/tmp/dfs/data</value>        </property></configuration>

Hadoop runs in a configuration file (the configuration file is read when running Hadoop), so if you need to switch back from pseudo-distributed mode to non-distributed mode, you need to remove the configuration items from the Core-site.xml. In addition, pseudo-distributed only needs to be configured Fs.defaultfs and dfs.replication can be run (refer to the official tutorial), but if the Hadoop.tmp.dir parameter is not configured, the default use of the temporary directory is/tmp/hadoo-hadoop, This directory may be removed by the system when it restarts, causing the format to be re-executed. So we set it up, and we also specify Dfs.namenode.name.dir and Dfs.datanode.data.dir, otherwise you might get an error in the next step.

After the configuration is complete, perform NameNode formatting

$./bin/hdfs Namenode-format

Start the Namenode and Datanode processes and view the startup results

$./sbin/start-dfs. SH $ JPS

After the boot is complete, the command JPS can be used to determine whether the startup is successful, and if successful, the following processes are listed: "NameNode", "DataNode" and "Secondarynamenode"

At this point, it is also possible to ask for a localhost password, if it is clearly entered the correct password is still unable to log in, the reason is because if you do not enter the user name when the default is the root user, but the security period SSH service by default does not open the root user ssh permissions

Enter the code:

$vim/etc/ssh/sshd_config

Check that Permitrootlogin is followed by Yes, and if not, delete the contents of the line code after Permitrootlogin, and then save it instead of Yes. Then enter the following code to restart the SSH service:

$/etc/init.d/sshd Restart

Log in as normal (password-free login Reference chapter I)

After successful startup, you can access the Web interface http://localhost:50070 View NameNode and Datanode information, and you can view the files in HDFS online.

At this point, the installation of Hadoop has been completed! Enjoy it!

Installation and configuration of Hadoop under Ubuntu16.04 (pseudo-distributed environment)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.