Hadoop in the Big Data era (i): Hadoop installation

Last Update:2014-12-22 Source: Internet

Author: User

Tags hadoop fs

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

1. Introduction to Hadoop version

Configuration files that were previously in the 0.20.2 version (without this version) are in Default.xml.

The 0.20.x version does not contain the Eclipse plug-in jar package, because the eclipse version is different, so you need to compile the source code to generate the corresponding plug-in.

The 0.20.2--0.22.x version of the configuration file is centralized in conf/core-site.xml, conf/hdfs-site.xml , and conf/mapred-site.xml . In .

0.23.x version has added yarn technology , the profile is centralized in Conf/core-site.xml, Conf/hdfs-site.xml, conf/yarn-site.xml and conf/ Mapred-site.xml. On these 4 files.

Due to the large changes in the 0.23.x version, new technologies have been added to make many Hadoop-based plug-ins difficult to be compatible with, such as Hive, hbase, pig, etc. based on previous versions of 0.23.x.

So Apache begins to unify the version number so that it can differentiate the functionality of Hadoop from the version number.

0.22.x Direct upgrade to 1.0.0

0.23.x Direct upgrade to 2.0.0

This divides hadoop into two versions of 1 and 2

Version 1: Mainly based on the original technology upgrade and development, while supporting the support of other technologies. If you want to use HBase, hive and other technologies, only select version 1

2 version: Based on the promotion and development of new technology, this is a good choice if it is only based on Hadoop development.

Currently the official Web download Hadoop Description:

Download

1.2.X- Current stable version, 1.2 release
2.4.X- current Stable 2.x version
0.23.X- similar to 2.x.x but missing NN HA.

2. Hadoop Installation and Mode

At the moment, I'm using hadoop-0.20.2 in the lab environment, so I'm going to describe it later based on this version.

The configuration of each component of Hadoop is under folder Conf. Early Hadoop used a configuration file Hadoop-site.xml to configure the Common,hdfs and MapReduce components, starting with the 0.20.0 release, divided into three files.

Core-site.xml: Configures the properties of the common component.
Hdfs-site.xml: Configures the HDFs property.
Mapred-sit.xml: Configures the MapReduce attribute.

2.1. Hadoop operating mode

There are three modes of operation of Hadoop:

Standalone mode (standalone or local mode): no daemon (daemon) is required, and all programs are executed on a single JVM. Mainly used in the development phase. The default property is set for this mode, so no additional configuration is required.
Pseudo-distributed mode (pseudo-distributed model): The Hadoop daemon runs on the local machine, simulating a small-scale cluster.
fully distributed mode (full distributed model): The Hadoop daemon runs on a cluster.

Different mode key Configuration properties

Component Name	Property name	Standalone mode	Pseudo distribution Mode	Full distribution Mode
Common	Fs.default.name	file:/// Default	hdfs://localhost:9000	hdfs://namenode:9000
Hdfs	Dfs.replication	N/A	1	3 (default)
Mapreduce	Mapred.job.tracker	Local (default)	localhost:9001	jobtracker:9001

2.2. Native mode installation

Because the default properties are set specifically for this mode and do not need to run any daemons, this mode does not require any action other than setting the Dfs.replication value to 1 .

Test:

Go to the $hadoop_home directory and execute the following command to test whether the installation was successful

[Plain]View Plaincopyprint?

$ mkdir Input
$ CP Conf/*.xml Input
$ bin/hadoop jar hadoop-examples-*.jar grep input Output ' dfs[a-z. +
$ cat output/*

Output:
1 dfsadmin

After the above steps, if no error occurs, the installation succeeds.

2.3. Pseudo-distributed mode installation procedure

Installation steps:

1. Setting environment variables (java_home,path,hadoop_home,classpath)
2. Modify the Hadoop configuration file (core-site.xml,hdfs-site.xml,mapred-site.xml)
3, set up SSH login without password
4. Format File system Hadoop Namenode-format
5. Start the daemon process start-all.sh
6. Stop Daemon Process

The second step of the example:

[HTML]View Plaincopyprint?

<configuration>
<property>
<name>fs.default.name</name>
<value>localhost:9000</value>
</Property>
<property>
<name>mapred.job.tracker</name>
<value>localhost:9001</value>
</Property>
<property>
<name>dfs.replication</name>
<value>1</value>
</Property>
</configuration>

Namenode and Jobtracker status can be viewed via web page after launch
namenode-http://localhost:50070/
jobtracker-http://localhost:50030/

Test:

Copying files to a distributed file system

[Plain]View Plaincopyprint?

$ bin/hadoop fs-put conf input

Run Tests

[Plain]View Plaincopyprint?

$ bin/hadoop jar hadoop-examples-*.jar grep input Output ' dfs[a-z. +

Get the execution results of a test program

[Plain]View Plaincopyprint?

$ bin/hadoop Fs-cat output/*

Output:

[Plain]View Plaincopyprint?

3 Dfs.class
2 Dfs.period
1 dfs.file
1 dfs.replication
1 dfs.servers
1 dfsadmin
1 Dfsmetrics.log

After the above steps, if no error occurs, the installation succeeds.

2.4. Full Distribution mode installation procedure

Installation steps:

1. Setting environment variables (java_home,path,hadoop_home,classpath)
2. Modify the Hadoop configuration file (core-site.xml,hdfs-site.xml,mapred-site.xml,masters,slaves)
3, set up SSH login without password
4. Format File system Hadoop Namenode-format
5. Start the daemon process start-all.sh
6. Stop Daemon Process

Namenode and Jobtracker status can be viewed via web page after launch
namenode-http://namenode:50070/
jobtracker-http://jobtracker:50030/

Attention:
Hadoop is installed in the same location on each machine, and the user name is the same.

3. Eclipse Plug-in installation

The Eclipse Hadoop plugin is designed to quickly develop a MapReduce program that provides

MapReduce location view for setting the MapReduce variable;

Windows->preferences added set the Hadoop installation location setting bar;

added DFS Locations project in Project Explore view , able to view the contents of HDFs file system and be able to upload downloaded files;

New project has added MapReduce project;

Added the run on Hadoop platform feature.

It is important to note that the Contrib\eclipse-plugin\hadoop-0.20.2-eclipse-plugin.jar of Hadoop comes out of date and needs to download a new one from the Internet, otherwise it does not respond when running the MapReduce program.

Hadoop in the Big Data era (i): Hadoop installation

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More