Hadoop In The Big Data era (1): hadoop Installation

Last Update:2014-10-13 Source: Internet

Author: User

Tags hadoop fs

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

1. hadoop version Introduction

Configuration files earlier than version 0.20.2 (excluding this version) are in default. xml.

Versions later than 0.20.x do not include jar packages with Eclipse plug-ins. Because eclipse versions are different, you need to compile the source code to generate the corresponding plug-ins.

0.20.2 -- 0.22.x configuration files are concentrated inConf/core-site.xml, Conf/hdfs-site.xmlAndConf/mapred-site.xml..

In version 0.23.x, Yarn Technology is added., Configuration files in CONF/core-site.xml, CONF/hdfs-site.xml,Conf/yarn-site.xmlAnd CONF/mapred-site.xml. These 4 files.

As version 0.23.x has changed a lot, new technologies have been added, making it difficult for many hadoop-based plug-ins to be compatible. For example, hive, hbase, pig, and so on are all based on versions earlier than version 0.23.x.

Therefore, Apache starts to unify the version number so that hadoop functions can be distinguished from the version number.

0.22.x upgrade directly to 1.0.0

0.23.x upgrade directly to 2.0.0

In this way, hadoop is divided into two versions: 1 and 2.

Version 1: It is mainly based on the upgrade and development of the original technology and supports other technologies. If you want to use hbase, hive, and other technologies, you only need to select version 1.

Version 2: It is mainly based on the promotion and development of new technologies. If it is only based on hadoop development, this is a good choice.

Description of hadoop download on the official website:

Download

1.2.x-Current stable version, 1.2 Release
2.4.x-Current stable 2.x version
0.23.x-Similar to 2. x. x but missing NN ha.

2. hadoop installation and Mode

Currently, I am using hadoop-0.20.2 in the lab environment, so later I will describe based on this version.

The configuration of each hadoop component is under the conf folder. Early hadoop used a configuration file hadoop-site.xml to configure the common, HDFS and mapreduce components, starting from version 0.20.0, divided into three files.

Core-site.xml: Configure properties for the common component.
Hdfs-site.xml: Configure HDFS properties.
Mapred-sit.xml: Configure mapreduce properties.

2.1 hadoop Running Mode

Hadoop runs in the following modes:

Standalone or local mode ):No daemon is required. All programs are executed on a single JVM. It is mainly used in the development stage. The default attribute is set for this mode, so no additional configuration is required.
Pseudo-distributed mode(Pseudo-distributed model): The hadoop daemon runs on a local machine to simulate a small-scale cluster.
Full Distribution Mode(Full distributed model): The hadoop daemon runs on a cluster.

Key configuration attributes of different modes

Component name	Attribute name	Independent Mode	Pseudo Distribution Mode	Full Distribution Mode
Common	FS. Default. Name	File :///(Default)	HDFS: // localhost: 9000	HDFS: // namenode: 9000
HDFS	DFS. Replication	N/	1	3 (default)
Mapreduce	Mapred. Job. Tracker	Local (default)	Localhost: 9001	Jobtracker: 9001

2.2 local Installation

Because the default attribute is set in this mode, and no daemon is requiredDFS. Replication value is set to 1No other operations are required.

Test:

Go to the $ hadoop_home directory and run the following command to test whether the installation is successful.

   $ mkdir input    $ cp conf/*.xml input    $ bin/hadoop jar hadoop-examples-*.jar grep input output ‘dfs[a-z.]+‘    $ cat output/*

Output:
1 dfsadmin

After the above steps, if there is no error, the installation is successful.

2.3 installation steps in pseudo-distributed mode

Installation steps:

1. Set environment variables (java_home, path, hadoop_home, classpath)
2. Modify hadoop configuration files (core-site.xml, hdfs-site.xml, mapred-site.xml)
3. Set SSH password-less Login
4. format the file system hadoop namenode-format
5. Start the daemon start-all.sh.
6. Stop the daemon process

Step 2:

<configuration>   <property>     <name>fs.default.name</name>     <value>localhost:9000</value>   </property>   <property>     <name>mapred.job.tracker</name>     <value>localhost:9001</value>   </property>   <property>     <name>dfs.replication</name>     <value>1</value>   </property> </configuration>

After startup, you can view the namenode and jobtracker statuses on the webpage.
Namenode-http: // localhost: 50070/
Jobtracker-http: // localhost: 50030/

Test:

Copy a file to a Distributed File System

$ bin/hadoop fs -put conf input

Run the test

$ bin/hadoop jar hadoop-examples-*.jar grep input output ‘dfs[a-z.]+‘

Obtain the execution result of the test program

$ bin/hadoop fs -cat output/*

Output:

   3 dfs.class   2 dfs.period   1 dfs.file   1 dfs.replication   1 dfs.servers   1 dfsadmin   1 dfsmetrics.log

After the above steps, if there is no error, the installation is successful.

2.4 installation steps in full distribution mode

Installation steps:

1. Set environment variables (java_home, path, hadoop_home, classpath)
2. Modify hadoop configuration files (core-site.xml, hdfs-site.xml, mapred-site.xml, masters, slaves)
3. Set SSH password-less Login
4. format the file system hadoop namenode-format
5. Start the daemon start-all.sh.
6. Stop the daemon process

After startup, you can view the namenode and jobtracker statuses on the webpage.
Namenode-http: // namenode: 50070/
Jobtracker-http: // jobtracker: 50030/

Note:
Install hadoop in the same location on each machine with the same user name.

3. Install the Eclipse plug-in

Eclipse hadoop plug-in provides

Mapreduce location view, used to set mapreduce variables;

Windows-> Preferences adds the settings column for setting the hadoop installation location;

InAdded DFS locations in the project category E view.Project to view the content of the HDFS file system and upload and download files;

Mapreduce project is added to the new project;

AddedRun on hadoopPlatform features.

It should be noted that the contrib \ eclipse-plugin \ hadoop-0.20.2-eclipse-plugin.jar of hadoop is out of date, and a new one needs to be downloaded from the Internet, otherwise there is no response when running mapreduce program.

hadoop In The Big Data era (1): hadoop installation

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More