Hadoop In The Big Data era (1): hadoop Installation

Source: Internet
Author: User
Tags hadoop fs

 

 

1. hadoop version Introduction

 

Configuration files earlier than version 0.20.2 (excluding this version) are in default. xml.

Versions later than 0.20.x do not include jar packages with Eclipse plug-ins. Because eclipse versions are different, you need to compile the source code to generate the corresponding plug-ins.

0.20.2 -- 0.22.x configuration files are concentrated inConf/core-site.xml, Conf/hdfs-site.xmlAndConf/mapred-site.xml..

In version 0.23.x, Yarn Technology is added., Configuration files in CONF/core-site.xml, CONF/hdfs-site.xml,Conf/yarn-site.xmlAnd CONF/mapred-site.xml. These 4 files.

 

As version 0.23.x has changed a lot, new technologies have been added, making it difficult for many hadoop-based plug-ins to be compatible. For example, hive, hbase, pig, and so on are all based on versions earlier than version 0.23.x.

Therefore, Apache starts to unify the version number so that hadoop functions can be distinguished from the version number.

0.22.x upgrade directly to 1.0.0

0.23.x upgrade directly to 2.0.0

In this way, hadoop is divided into two versions: 1 and 2.

Version 1: It is mainly based on the upgrade and development of the original technology and supports other technologies. If you want to use hbase, hive, and other technologies, you only need to select version 1.

Version 2: It is mainly based on the promotion and development of new technologies. If it is only based on hadoop development, this is a good choice.

 

Description of hadoop download on the official website:

Download
    • 1.2.x-Current stable version, 1.2 Release
    • 2.4.x-Current stable 2.x version
    • 0.23.x-Similar to 2. x. x but missing NN ha.

 

2. hadoop installation and Mode

 

Currently, I am using hadoop-0.20.2 in the lab environment, so later I will describe based on this version.

The configuration of each hadoop component is under the conf folder. Early hadoop used a configuration file hadoop-site.xml to configure the common, HDFS and mapreduce components, starting from version 0.20.0, divided into three files.

Core-site.xml: Configure properties for the common component.
Hdfs-site.xml: Configure HDFS properties.
Mapred-sit.xml: Configure mapreduce properties.

 

2.1 hadoop Running Mode

Hadoop runs in the following modes:

Standalone or local mode ):No daemon is required. All programs are executed on a single JVM. It is mainly used in the development stage. The default attribute is set for this mode, so no additional configuration is required.
Pseudo-distributed mode(Pseudo-distributed model): The hadoop daemon runs on a local machine to simulate a small-scale cluster.
Full Distribution Mode(Full distributed model): The hadoop daemon runs on a cluster.

 

Key configuration attributes of different modes

Component name

Attribute name

Independent Mode

Pseudo Distribution Mode

Full Distribution Mode

Common

FS. Default. Name

File :///(Default)

HDFS: // localhost: 9000

HDFS: // namenode: 9000

HDFS

DFS. Replication

N/

1

3 (default)

Mapreduce

Mapred. Job. Tracker

Local (default)

Localhost: 9001

Jobtracker: 9001

 

2.2 local Installation


Because the default attribute is set in this mode, and no daemon is requiredDFS. Replication value is set to 1No other operations are required.

 

Test:

Go to the $ hadoop_home directory and run the following command to test whether the installation is successful.

   $ mkdir input    $ cp conf/*.xml input    $ bin/hadoop jar hadoop-examples-*.jar grep input output ‘dfs[a-z.]+‘    $ cat output/*


 

Output:
1 dfsadmin

 

After the above steps, if there is no error, the installation is successful.

 


2.3 installation steps in pseudo-distributed mode

Installation steps:

1. Set environment variables (java_home, path, hadoop_home, classpath)
2. Modify hadoop configuration files (core-site.xml, hdfs-site.xml, mapred-site.xml)
3. Set SSH password-less Login
4. format the file system hadoop namenode-format
5. Start the daemon start-all.sh.
6. Stop the daemon process

 

Step 2:

<configuration>   <property>     <name>fs.default.name</name>     <value>localhost:9000</value>   </property>   <property>     <name>mapred.job.tracker</name>     <value>localhost:9001</value>   </property>   <property>     <name>dfs.replication</name>     <value>1</value>   </property> </configuration> 


 

After startup, you can view the namenode and jobtracker statuses on the webpage.
Namenode-http: // localhost: 50070/
Jobtracker-http: // localhost: 50030/


Test:


Copy a file to a Distributed File System

$ bin/hadoop fs -put conf input 


Run the test

$ bin/hadoop jar hadoop-examples-*.jar grep input output ‘dfs[a-z.]+‘ 


Obtain the execution result of the test program

$ bin/hadoop fs -cat output/*

 

Output:

   3 dfs.class   2 dfs.period   1 dfs.file   1 dfs.replication   1 dfs.servers   1 dfsadmin   1 dfsmetrics.log



After the above steps, if there is no error, the installation is successful.

 

2.4 installation steps in full distribution mode

Installation steps:

1. Set environment variables (java_home, path, hadoop_home, classpath)
2. Modify hadoop configuration files (core-site.xml, hdfs-site.xml, mapred-site.xml, masters, slaves)
3. Set SSH password-less Login
4. format the file system hadoop namenode-format
5. Start the daemon start-all.sh.
6. Stop the daemon process

 

After startup, you can view the namenode and jobtracker statuses on the webpage.
Namenode-http: // namenode: 50070/
Jobtracker-http: // jobtracker: 50030/


Note:
Install hadoop in the same location on each machine with the same user name.


3. Install the Eclipse plug-in


Eclipse hadoop plug-in provides

Mapreduce location view, used to set mapreduce variables;

Windows-> Preferences adds the settings column for setting the hadoop installation location;

InAdded DFS locations in the project category E view.Project to view the content of the HDFS file system and upload and download files;

Mapreduce project is added to the new project;

AddedRun on hadoopPlatform features.

 


It should be noted that the contrib \ eclipse-plugin \ hadoop-0.20.2-eclipse-plugin.jar of hadoop is out of date, and a new one needs to be downloaded from the Internet, otherwise there is no response when running mapreduce program.

 

 

 

hadoop In The Big Data era (1): hadoop installation

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.