Hadoop in the Big Data era (i): Hadoop installation

Source: Internet
Author: User
Tags hadoop fs

1. Introduction to Hadoop version

Configuration files that were previously in the 0.20.2 version (without this version) are in Default.xml.

The 0.20.x version does not contain the Eclipse plug-in jar package, because the eclipse version is different, so you need to compile the source code to generate the corresponding plug-in.

The 0.20.2--0.22.x version of the configuration file is centralized in conf/core-site.xml, conf/hdfs-site.xml , and conf/mapred-site.xml . In .

0.23.x version has added yarn technology , the profile is centralized in Conf/core-site.xml, Conf/hdfs-site.xml, conf/yarn-site.xml and conf/ Mapred-site.xml. On these 4 files.

Due to the large changes in the 0.23.x version, new technologies have been added to make many Hadoop-based plug-ins difficult to be compatible with, such as Hive, hbase, pig, etc. based on previous versions of 0.23.x.

So Apache begins to unify the version number so that it can differentiate the functionality of Hadoop from the version number.

0.22.x Direct upgrade to 1.0.0

0.23.x Direct upgrade to 2.0.0

This divides hadoop into two versions of 1 and 2

Version 1: Mainly based on the original technology upgrade and development, while supporting the support of other technologies. If you want to use HBase, hive and other technologies, only select version 1

2 version: Based on the promotion and development of new technology, this is a good choice if it is only based on Hadoop development.

Currently the official Web download Hadoop Description:

Download
    • 1.2.X- Current stable version, 1.2 release
    • 2.4.X- current Stable 2.x version
    • 0.23.X- similar to 2.x.x but missing NN HA.

2. Hadoop Installation and Mode

At the moment, I'm using hadoop-0.20.2 in the lab environment, so I'm going to describe it later based on this version.

The configuration of each component of Hadoop is under folder Conf. Early Hadoop used a configuration file Hadoop-site.xml to configure the Common,hdfs and MapReduce components, starting with the 0.20.0 release, divided into three files.

Core-site.xml: Configures the properties of the common component.
Hdfs-site.xml: Configures the HDFs property.
Mapred-sit.xml: Configures the MapReduce attribute.

2.1. Hadoop operating mode

There are three modes of operation of Hadoop:

Standalone mode (standalone or local mode): no daemon (daemon) is required, and all programs are executed on a single JVM. Mainly used in the development phase. The default property is set for this mode, so no additional configuration is required.
Pseudo-distributed mode (pseudo-distributed model): The Hadoop daemon runs on the local machine, simulating a small-scale cluster.
fully distributed mode (full distributed model): The Hadoop daemon runs on a cluster.

Different mode key Configuration properties

Component Name

Property name

Standalone mode

Pseudo distribution Mode

Full distribution Mode

Common

Fs.default.name

file:/// Default

hdfs://localhost:9000

hdfs://namenode:9000

Hdfs

Dfs.replication

N/A

1

3 (default)

Mapreduce

Mapred.job.tracker

Local (default)

localhost:9001

jobtracker:9001

2.2. Native mode installation


Because the default properties are set specifically for this mode and do not need to run any daemons, this mode does not require any action other than setting the Dfs.replication value to 1 .

Test:

Go to the $hadoop_home directory and execute the following command to test whether the installation was successful

[Plain]View Plaincopyprint?
    1. $ mkdir Input
    2. $ CP Conf/*.xml Input
    3. $ bin/hadoop jar hadoop-examples-*.jar grep input Output ' dfs[a-z. +
    4. $ cat output/*


Output:
1 dfsadmin

After the above steps, if no error occurs, the installation succeeds.


2.3. Pseudo-distributed mode installation procedure

Installation steps:

1. Setting environment variables (java_home,path,hadoop_home,classpath)
2. Modify the Hadoop configuration file (core-site.xml,hdfs-site.xml,mapred-site.xml)
3, set up SSH login without password
4. Format File system Hadoop Namenode-format
5. Start the daemon process start-all.sh
6. Stop Daemon Process

The second step of the example:

[HTML]View Plaincopyprint?
  1. <configuration>
  2. <property>
  3. <name>fs.default.name</name>
  4. <value>localhost:9000</value>
  5. </Property>
  6. <property>
  7. <name>mapred.job.tracker</name>
  8. <value>localhost:9001</value>
  9. </Property>
  10. <property>
  11. <name>dfs.replication</name>
  12. <value>1</value>
  13. </Property>
  14. </configuration>


Namenode and Jobtracker status can be viewed via web page after launch
namenode-http://localhost:50070/
jobtracker-http://localhost:50030/


Test:


Copying files to a distributed file system

[Plain]View Plaincopyprint?
    1. $ bin/hadoop fs-put conf input


Run Tests

[Plain]View Plaincopyprint?
    1. $ bin/hadoop jar hadoop-examples-*.jar grep input Output ' dfs[a-z. +


Get the execution results of a test program

[Plain]View Plaincopyprint?
    1. $ bin/hadoop Fs-cat output/*

Output:

[Plain]View Plaincopyprint?
    1. 3 Dfs.class
    2. 2 Dfs.period
    3. 1 dfs.file
    4. 1 dfs.replication
    5. 1 dfs.servers
    6. 1 dfsadmin
    7. 1 Dfsmetrics.log



After the above steps, if no error occurs, the installation succeeds.

2.4. Full Distribution mode installation procedure

Installation steps:

1. Setting environment variables (java_home,path,hadoop_home,classpath)
2. Modify the Hadoop configuration file (core-site.xml,hdfs-site.xml,mapred-site.xml,masters,slaves)
3, set up SSH login without password
4. Format File system Hadoop Namenode-format
5. Start the daemon process start-all.sh
6. Stop Daemon Process

Namenode and Jobtracker status can be viewed via web page after launch
namenode-http://namenode:50070/
jobtracker-http://jobtracker:50030/


Attention:
Hadoop is installed in the same location on each machine, and the user name is the same.


3. Eclipse Plug-in installation


The Eclipse Hadoop plugin is designed to quickly develop a MapReduce program that provides

MapReduce location view for setting the MapReduce variable;

Windows->preferences added set the Hadoop installation location setting bar;

added DFS Locations project in Project Explore view , able to view the contents of HDFs file system and be able to upload downloaded files;

New project has added MapReduce project;

Added the run on Hadoop platform feature.


It is important to note that the Contrib\eclipse-plugin\hadoop-0.20.2-eclipse-plugin.jar of Hadoop comes out of date and needs to download a new one from the Internet, otherwise it does not respond when running the MapReduce program.

Hadoop in the Big Data era (i): Hadoop installation

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.