ubuntu16.04+hadoop2.7.3 Environment Construction

Source: Internet
Author: User

Reprint please indicate source: http://www.cnblogs.com/lighten/p/6106891.html

Recently began to learn big data-related knowledge, the most famous is the open source Hadoop platform. Here is a record of the current version of Hadoop in the Ubuntu system build process. The construction process found a very clear and comprehensive construction articles, this article cut some unimportant content, refined some content. Click here to view: original.

Installation of 1.JDK

Hadoop is a big data platform developed using Java, which naturally requires the installation of the Java Runtime Environment, and of course, Hadoop does not necessarily require the Java language, and Hadoop's development supports many languages.

The installation of the Java operating environment is described in another article, which is not described here: Ubuntu16.04 installing the JDK.

2. Configure SSH and password-free login

Hadoop needs to be logged in using SSH, and SSH needs to be installed under Linux. The client is already installed, just install the server side:

sudo apt-get install openssh-server

Test log in native SSH localhost input yes should be able to log in. But each time the input is cumbersome, if it is a cluster that is a disaster, so to be configured as a password-free way to login.

A total of three steps:

1. Generate the Public key ssh-keygen-t RSA, the file will be generated under the ~/.ssh folder Id_rsa: Private key, id_rsa.pub: Public key

2. Import the public key to the authentication file and change the permissions:

1) Import this machine:cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys

2) Import the server:

First copy the public key to the server:

      SCP ~/.ssh/id_rsa.pub [Email protected]:/home/xxx/id_rsa.pub

The public key is then imported into the authentication file, and this step is done on the server:

      cat ~/id_rsa.pub >> ~/.ssh/authorized_keys

Finally, change permissions on the server:

      chmod ~/.ssh
chmod ~/.ssh/authorized_keys

3) test: SSH localhost needs to enter yes for the first time, then it is not required.

Installation of 3.Hadoop

1. Download the Hadoop installation package: click here. Download binary on the line. You can also use the wget command to download.

2. Unzip, move to the folder you want to place

TAR-ZVXF hadoop-2.7.3.tar.gz

mv./hadoop-2.7.3.tar.gz/opt/hadoop

3. Create Hadoop users and groups and grant Execute permissions

sudo addgroup Hadoop

sudo usermod-a-g Hadoop xxx #将当前用户加入到hadoop组

sudo gedit etc/sudoers #将hadoop组加入到sudoer

After Root all= (all) all, Hadoop all= (All) all

sudo chmod-r 755/opt/hadoop

sudo chown-r xxx:hadoop/opt/hadoop//Otherwise SSH will deny access

These are generally required operations, this article also carried out other configurations, if you encounter problems can be seen, is not due to these configurations caused: point here.

4. Modify the configuration file, as with the JDK installation, you can choose which file to modify. Modify/etc/profile here

Export hadoop_home=/opt/hadoop2.7.3

Export Path=.:${java_home}/bin:${hadoop_home}/bin: $PATH

Source/etc/hadoop

This configuration article also with a lot of other configurations, I temporarily do not configure, encountered problems, can be used as a reference. Point here.

5. Test whether the configuration is successful

Hadoop version

6.hadoop stand-alone configuration (non-distributed mode)

Hadoop defaults to non-distributed mode and no additional configuration is required. You can test the demo to see if it is configured correctly.

Cd/opt/hadoop

mkdir input

CP README.txt Input

Bin/hadoop Jar Share/hadoop/mapreduce/sources/hadoop-mapreduce-examples-2.7.3-sources.jar Org.apache.hadoop.examples.WordCount Input Output

7.hadoop Pseudo-distributed configuration

Pseudo-distributed only needs to change two files is enough. The configuration files are in Etc/hadoop in the Hadoop directory.

The first is the Core-site.xml, set the temporary directory location, otherwise the default will be in the/tmp/hadoo-hadoop, the folder in the reboot may be erased by the system, so you need to change the configuration path.

<configuration>        <property>             <name>hadoop.tmp.dir</name>             <value>file:/ Opt/hadoop/tmp</value>             <description>abase for other temporary directories.</description>        </property>        <property>             <name>fs.defaultFS</name>             <value>hdfs:// Localhost:9000</value>        </property></configuration>

Then is Hdfs-site.xml, pseudo-distributed only one node, so must be configured to 1. The node locations for Datanode and Namenode are also configured.

<configuration>        <property>             <name>dfs.replication</name>             <value>1< /value>        </property>        <property>             <name>dfs.namenode.name.dir</name>             <value>file:/opt/hadoop/tmp/dfs/name</value>        </property>        <property>             < name>dfs.datanode.data.dir</name>             <value>file:/opt/hadoop/tmp/dfs/data</value>        </property></configuration>

Then execute the format command, formatting the name node:./bin/hdfs Namenode-format

To open HDFs: ./sbin/start-dfs.sh If the SSH authentication input Yes is available.

Enter the JPS command to see if the startup was successful

Access http://localhost:50070 view node information.

Close HDFs:./sbin/stop-dfs.sh

The above is the configuration of HDFs, the next need to configure the relevant configuration of MapReduce, does not deserve this does not affect what. However, with the lack of resource scheduling, the hadoop2.x version uses yarn for task scheduling management, which is the biggest difference from the 1.x version.

CP ./etc/hadoop/mapred-site.xml.template./etc/hadoop/mapred-site.xml

Vim./etc/hadoop/mapred-site.xml

<configuration>        <property>             <name>mapreduce.framework.name</name>             <value >yarn</value>        </property></configuration>

Modify Yarn's configuration file: Yarn-site.xml

<configuration>        <property>             <name>yarn.nodemanager.aux-services</name>             < Value>mapreduce_shuffle</value>            </property></configuration>

To start yarn, start the hdfs:./sbin/start-yarn.sh first.

Turn on the history server so you can see the task running in the Web interface:./sbin/mr-jobhistory-daemon.sh start Historyserver

When YARN is not enabled, it is "mapred." Localjobrunner "After running the task, enabling YARN, is" mapred. Yarnrunner "in the running task. One of the benefits of starting YARN is that you can view the operation of a task through the Web interface: Http://localhost:8088/cluster.

8. Distributed deployment, no two computers, no attempt, specifically see: here.

4 post-language

Due to the previous configuration of the machine, it is unavoidable to write the time will be missing some details, if there is any problem, please advise.

ubuntu16.04+hadoop2.7.3 Environment Construction

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.