Hadoop pseudo-distributed Installation "translated from hadoop1.1.2 official documents"

Source: Internet
Author: User
Tags hadoop fs

platforms supported by 1.hadoop:
    • The Gnu/linux platform is a development and production platform. Hadoop has been proven to be more than 2000 nodes on the Gnu/linux platform.
    • Win32 is a development platform, and distributed operations are not well tested on Win32 systems, so it is not used as a production environment.
2. Install the required software for Hdoop:

Software required for installing Hadoop under Linux and Windows:

2.1 The 1.6 jdk downloaded from the Sun website must be installed.

2.2SSH must be installed and the SSH protocol must be used by Hadoop scripts to manage the remote Hadoop process.

In a 2.3windows environment, additional software installation is required: Cygwin-the shell environment that must be installed to run the above software.

3. Install the Software:

If your cluster does not have the necessary software, then you must install them

The following commands are executed on Unbuntu Linux:

$ sudo apt-get install SSH
$ sudo apt-get install rsync

In the Windows environment, if you install Cywin, you do not need to install the above software, only need to install Cygwin when the relevant package can be selected.

Openssh-theNetCategory4. Download Hadoop Address: HTTP://HADOOP.APACHE.ORG/RELEASES.HTML5. Prepare to start Hadoop cluster: Unzip the downloaded Hadoop package, and in the package, edit the conf/ The hadoop-env.sh file in which the java_home is defined.

Try the command:
$ bin/hadoop
It will show you how to use Hadoop scripts.

Now you will be installing one of the three Hadoop support modes, local installation:

    • Local (single node) mode
    • Pseudo distribution Mode
    • Distributed installation mode
6. Single-node installation: By default, Hadoop is configured to run not distributed mode as a separate Java process. This mode is useful for debugging purposes.

This sample program is a Hadoop-brought file that copies the XML files under the Conf to the input directory and finds and displays all rows that match the regular expression of the last parameter, output folder

$ mkdir Input
$ CP Conf/*.xml Input
$ bin/hadoop jar hadoop-examples-*.jar grep input Output ' dfs[a-z. +
$ cat output/*

So, the local installation is complete!

7. Pseudo-Distributed installation

Hadoop can also run on a separate node in pseudo-distributed mode, where each Hadoop process runs in a separate Java process.

7.1 Configuration:

Use the following configuration:

Conf/core-site.xml:

<configuration>     <property>         <name>fs.default.name</name>         <value>hdfs ://localhost:9000</value>
         <description>localhost machine name </description> </property></configuration> to switch cost     


Conf/hdfs-site.xml:

<configuration>     <property>         <name>dfs.replication</name>         <value>1< /value>     </property></configuration>


Conf/mapred-site.xml:

<configuration>     <property>         <name>mapred.job.tracker</name>         <value> Localhost:9001</value>     </property></configuration>
7.2 Installing SSH can now test the local password-free login:

$ ssh localhost

If you are unable to log on locally, you can regenerate the SSH key by performing the following chitian:
$ ssh-keygen-t Dsa-p "-F ~/.SSH/ID_DSA
$ cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys

7.3 Execution:

To format a new Distributed File system:
$ bin/hadoop Namenode-format

Start the Hadoop process:
$ bin/start-all.sh

The log output directory for the HADOOP process is: ${hadoop_log_dir} directory (defaults to ${hadoop_home}/logs).

Browse the Namenode and Jobtracker states via the Web page, by default their access address is:

    • namenode-http://localhost:50070/
    • jobtracker-http://localhost:50030/
7.4 Testing Hadoop:

Copy the files in input to the distributed system:
$ bin/hadoop fs-put conf input

The run provides some examples:
$ bin/hadoop jar hadoop-examples-*.jar grep input Output ' dfs[a-z. +

Check the output:

Copy the files in the output from the distributed system to the local directory and detect them:
$ bin/hadoop fs-get Output output
$ cat output/*

Or

To view the distributed directory under output:
$ bin/hadoop Fs-cat output/*

To stop a process:
$ bin/stop-all.sh

At this point, the pseudo-distributed installation of Hadoop is complete.

Hadoop pseudo-distributed Installation "translated from hadoop1.1.2 official documents"

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.