Hadoop pseudo-distributed Installation "translated from hadoop1.1.2 official documents"

Last Update:2014-12-03 Source: Internet

Author: User

Tags hadoop fs

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

platforms supported by 1.hadoop:

The Gnu/linux platform is a development and production platform. Hadoop has been proven to be more than 2000 nodes on the Gnu/linux platform.
Win32 is a development platform, and distributed operations are not well tested on Win32 systems, so it is not used as a production environment.

2. Install the required software for Hdoop:

Software required for installing Hadoop under Linux and Windows:

2.1 The 1.6 jdk downloaded from the Sun website must be installed.

2.2SSH must be installed and the SSH protocol must be used by Hadoop scripts to manage the remote Hadoop process.

In a 2.3windows environment, additional software installation is required: Cygwin-the shell environment that must be installed to run the above software.

3. Install the Software:

If your cluster does not have the necessary software, then you must install them

The following commands are executed on Unbuntu Linux:

$ sudo apt-get install SSH
$ sudo apt-get install rsync

In the Windows environment, if you install Cywin, you do not need to install the above software, only need to install Cygwin when the relevant package can be selected.

Openssh-theNetCategory4. Download Hadoop Address: HTTP://HADOOP.APACHE.ORG/RELEASES.HTML5. Prepare to start Hadoop cluster: Unzip the downloaded Hadoop package, and in the package, edit the conf/ The hadoop-env.sh file in which the java_home is defined.

Try the command:
$ bin/hadoop
It will show you how to use Hadoop scripts.

Now you will be installing one of the three Hadoop support modes, local installation:

Local (single node) mode
Pseudo distribution Mode
Distributed installation mode

6. Single-node installation: By default, Hadoop is configured to run not distributed mode as a separate Java process. This mode is useful for debugging purposes.

This sample program is a Hadoop-brought file that copies the XML files under the Conf to the input directory and finds and displays all rows that match the regular expression of the last parameter, output folder

$ mkdir Input
$ CP Conf/*.xml Input
$ bin/hadoop jar hadoop-examples-*.jar grep input Output ' dfs[a-z. +
$ cat output/*

So, the local installation is complete!

7. Pseudo-Distributed installation

Hadoop can also run on a separate node in pseudo-distributed mode, where each Hadoop process runs in a separate Java process.

7.1 Configuration:

Use the following configuration:

Conf/core-site.xml:

<configuration>     <property>         <name>fs.default.name</name>         <value>hdfs ://localhost:9000</value>

         <description>localhost machine name </description> </property></configuration> to switch cost

Conf/hdfs-site.xml:

<configuration>     <property>         <name>dfs.replication</name>         <value>1< /value>     </property></configuration>

Conf/mapred-site.xml:

<configuration>     <property>         <name>mapred.job.tracker</name>         <value> Localhost:9001</value>     </property></configuration>

7.2 Installing SSH can now test the local password-free login:

$ ssh localhost

If you are unable to log on locally, you can regenerate the SSH key by performing the following chitian:
$ ssh-keygen-t Dsa-p "-F ~/.SSH/ID_DSA
$ cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys

7.3 Execution:

To format a new Distributed File system:
$ bin/hadoop Namenode-format

Start the Hadoop process:
$ bin/start-all.sh

The log output directory for the HADOOP process is: ${hadoop_log_dir} directory (defaults to ${hadoop_home}/logs).

Browse the Namenode and Jobtracker states via the Web page, by default their access address is:

namenode-http://localhost:50070/
jobtracker-http://localhost:50030/

7.4 Testing Hadoop:

Copy the files in input to the distributed system:
$ bin/hadoop fs-put conf input

The run provides some examples:
$ bin/hadoop jar hadoop-examples-*.jar grep input Output ' dfs[a-z. +

Check the output:

Copy the files in the output from the distributed system to the local directory and detect them:
$ bin/hadoop fs-get Output output
$ cat output/*

To view the distributed directory under output:
$ bin/hadoop Fs-cat output/*

To stop a process:
$ bin/stop-all.sh

At this point, the pseudo-distributed installation of Hadoop is complete.

Hadoop pseudo-distributed Installation "translated from hadoop1.1.2 official documents"

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More