Downloading and installing hadoop

Last Update:2018-12-04 Source: Internet

Author: User

Tags apache download configuration settings xsl

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Downloading and installing hadoop

Hadoop can be downloaded from one of the Apache download mirrors
. You may also download a nightly build
Or check out the code from subversion
And build it with ant
. Select a directory to install hadoop under (let's say/Foo/BAR/hadoop-install
)
And untar the tarball in that directory. A directory corresponding
The version of hadoop downloaded will be created under/Foo/BAR/hadoop-install
Directory. For instance, if version 0.6.0 of hadoop was downloaded untarring as described abve will create the Directory/Foo/BAR/hadoop-install/hadoop-0.6.0
. The examples in this document assume the existence of an environment variable$ Hadoop_install
That represents the path to all versions of hadoop installed. In the above instanceHadoop_install =/Foo/BAR/hadoop-install
. They further assume the existence of a symlink namedHadoop
In$ Hadoop_install
That points to the version of hadoop being used. For instance, if version 0.6.0 is being used then$ Hadoop_install/hadoop> hadoop-0.6.0
. All tools used to run hadoop will be present in the directory$ Hadoop_install/hadoop/bin
. All configuration files for hadoop will be present in the directory$ Hadoop_install/hadoop/Conf
.

Startup scripts

The$ Hadoop_install/hadoop/bin
Directory contains some scripts used to launch hadoop DFS and hadoop MAP/reduce daemons. These are:

Start-all.sh
-Starts all hadoop daemons, The namenode, datanodes, The jobtracker and tasktrackers.
Stop-all.sh
-Stops all hadoop daemons.
Start-mapred.sh
-Starts the hadoop MAP/reduce daemons, The jobtracker and tasktrackers.
Stop-mapred.sh
-Stops the hadoop MAP/reduce daemons.
Start-dfs.sh
-Starts the hadoop DFS daemons, The namenode and datanodes.
Stop-dfs.sh
-Stops the hadoop DFS daemons.

It is also possible to run the hadoop daemons as Windows services using the Java service wrapper
(Download this separately). This still requires cygwin to be installed
As hadoop requires its DF command. See the following Jira issues
Details:

Https://issues.apache.org/jira/browse/HADOOP-1525
Https://issues.apache.org/jira/browse/HADOOP-1526

Configuration Files

The$ Hadoop_install/hadoop/Conf
Directory contains some configuration files for hadoop. These are:

Hadoop-env.sh
-This file contains some environment variable settings used by hadoop.
You can use these to affect some aspects of hadoop daemon behavior,
Such as where log files are stored, the maximum amount of heap used
Etc. The only variable you shoshould need to change in this file isJava_home
, Which specifies the path to the Java 1.5.x installation used by hadoop.
Slaves
-This file lists the hosts, one per line, where the hadoop slave
Daemons (datanodes and tasktrackers) will run. By default this contains
The Single EntryLocalhost
Hadoop-default.xml
-This file contains generic default settings for hadoop daemons and MAP/reduce jobs.Do not modify this file.
Mapred-default.xml
-This file contains site specific settings for the hadoop MAP/reduce
Daemons and jobs. The file is empty by default. Putting Configuration
Properties in this file will override MAP/reduce settings inHadoop-default.xml
File. Use this file to tailor the behavior of MAP/reduce on your site.
Hadoop-site.xml
-This file contains site specific settings for all hadoop daemons and
MAP/reduce jobs. This file is empty by default. settings in this file
Override those inHadoop-default.xml
AndMapred-default.xml
.
This file shoshould contain settings that must be respected by all servers
And clients in a hadoop installation, for instance, the location of
Namenode and the jobtracker.

More details on configuration can be found on the howtoconfigure
Page.

Setting up hadoop on a Single Node

This
Section describes how to get started by setting up a hadoop cluster on
A single node. The setup described here is an HDFS instance with
Namenode and a single datanode and a MAP/reduce cluster with
Jobtracker and a single tasktracker. The configuration procedures
Described in basic configuration are just as applicable for larger
Clusters.

Basic Configuration

Take
A pass at putting together basic configuration settings for your
Cluster. Some of the settings that follow are required, others are
Recommended for more straightforward and predictable operation.

Hadoop Environment Settings
-Ensure thatJava_home
Is set inHadoop-env.sh
And points to the Java installation you intend to use. You can set other environment variables inHadoop-env.sh
To suit your requirments. Some of the default settings refer to the variableHadoop_home
. The valueHadoop_home
Is automatically inferred from the location of the startup scripts.Hadoop_home
Is the parent directory ofBin
Directory that holds the hadoop scripts. In this instance it is$ Hadoop_install/hadoop
.
Jobtracker and namenode settings
-Figure out where to run your namenode and jobtracker. Set the variableFS. Default. Name
To the namenode's intended host: port. Set the variableMapred. Job. Tracker
To the jobtrackers intended host: port. These settings shocould be inHadoop-site.xml
. You may also want to set one or more of the following ports (also inHadoop-site.xml
):
- DFS. datanode. Port
- Dfs.info. Port
- Mapred.job.tracker.info. Port
- Mapred. task. tracker. Output. Port
- Mapred. task. tracker. Report. Port
Data path settings
-Figure out where your data goes. This includes des settings for where
Namenode stores the namespace checkpoint and the edits log, where
Datanodes store filesystem blocks, storage locations for MAP/reduce
Intermediate output and temporary storage for the HDFS client.
Default values for these paths point to various locations in/Tmp
. While this might be OK for a single node installation, for larger clusters storing data in/Tmp
Is not an option. These settings must also be inHadoop-site.xml
. It is important for these settings to be present inHadoop-site.xml
Because they can otherwise be overridden by client Configuration
Settings in MAP/reduce jobs. Set the following variables to appropriate
Values:
- DFS. Name. dir
- DFS. Data. dir
- DFS. Client. Buffer. dir
- Mapred. Local. dir

An example ofHadoop-site.xml
File:

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
  <name>hadoop.tmp.dir</name>
  <value>/tmp/hadoop-${user.name}</value>
</property>
<property>
  <name>fs.default.name</name>
  <value>hdfs://localhost:54310</value>
</property>
<property>
  <name>mapred.job.tracker</name>
  <value>hdfs://localhost:54311</value>
</property>
<property> 
  <name>dfs.replication</name>
  <value>8</value>
</property>
<property>
  <name>mapred.child.java.opts</name>
  <value>-Xmx512m</value>
</property>
</configuration>

Formatting the namenode

The first
Step to starting up your hadoop installation is formatting the hadoop
Filesystem, which is implemented on top of the local filesystems
Your cluster. You need to do this the first time you set up a hadoop
Installation.Do not
Format a running hadoop filesystem, this will cause all your data to be erased. Before formatting, ensure thatDFS. Name. dir
Directory exists. If you just used the default, thenMkdir-P/tmp/hadoop-Username/dfs/Name
Will create the directory. to format the filesystem (which simply initializes the directory specified byDFS. Name. dir
Variable), run the command:
% $ Hadoop_install/hadoop/bin/hadoop namenode-format

Starting a single node cluster

Run the command:
% $ Hadoop_install/hadoop/bin/start-all.sh

This will startup a namenode, datanode, jobtracker and a tasktracker on your machine.

Stopping a single node cluster

Run the command
% $ Hadoop_install/hadoop/bin/stop-all.sh

To stop all the daemons running on your machine.

Separating configuration from installation

In
Example described abve, the configuration files used by the hadoop
Cluster all lie in the hadoop installation. This can become cumbersome
When upgrading to a new release since all custom config has to be
Re-created in the new installation. It is possible to separate
Config from the install. To do so, select a directory to house hadoop configuration (let's say/Foo/BAR/hadoop-config
. Copy all conf files to this directory. You can either setHadoop_conf_dir
Environment variable to refer to this directory or pass it directly to the hadoop scripts with-- Config
Option. In this case, the cluster Start and Stop commands specified in the above two sub-sections become
% $ Hadoop_install/hadoop/bin/start-all.sh -- config/Foo/BAR/hadoop-config
And
% $ Hadoop_install/hadoop/bin/stop-all.sh -- config/Foo/BAR/hadoop-config
.
Only the absolute path to the config directory shoshould be passed to the scripts.

Starting up a larger cluster

Ensure
That the hadoop package is accessible from the same path on all nodes
That are to be added in the cluster. If you have separated
Configuration from the install then ensure that the config directory is
Also accessible the same way.
PopulateSlaves
File with the nodes to be encoded in the cluster. One node per line.
Follow the steps inBasic Configuration
Section above.
Format the namenode
Run the command% $ Hadoop_install/hadoop/bin/start-dfs.sh
On the node you want the namenode to run on. This will bring up HDFS
With the namenode running on the machine you ran the command on and
Datanodes on the machines listed in the slaves file mentioned above.
Run the command% $ Hadoop_install/hadoop/bin/start-mapred.sh
On the machine you plan to run the jobtracker on. This will bring up
The MAP/reduce cluster with jobtracker running on the machine you ran
The command on and tasktrackers running on machines listed in
Slaves file.
The above two commands can also be executed with-- Config
Option.

Stopping the Cluster

The cluster can be stopped by running% $ Hadoop_install/hadoop/bin/stop-mapred.sh
And then% $ Hadoop_install/hadoop/bin/stop-dfs.sh
On your jobtracker and namenode respectively. These commands also accept-- Config
Option.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More