Objective
The purpose of this document is to help you quickly complete the Hadoop installation and use on a single computer so that you can experience the Hadoop Distributed File System (HDFS) and the map-reduce framework, such as running sample programs or simple jobs on HDFS.
Prerequisite Support Platform GNU is a platform for product development and operation. Hadoop has been validated on a clustered system consisting of 2000-node GNU hosts. The WIN32 platform is supported as a development platform. Since distributed operations have not been fully tested on the Win32 platform, they are not supported as a production platform. Required Software
The Linux and Windows required software include:
javatm1.5.x, must be installed, the Java version of Sun released is recommended. SSH must be installed and guaranteed to run sshd to manage the remote Hadoop daemon with a Hadoop script.
Additional software requirements under Windows
Cygwin-Provides shell support beyond the above software. Install software
If your cluster does not have the required software installed, you must first install them.
Take Ubuntu Linux for example:
$ sudo apt install ssh
$ sudo apt install rsync
On the Windows platform, if you install Cygwin without all the required software installed, you need to start Cyqwin Setup Manager to install the following package:
openssh-net class Download
To get the release of Hadoop, download the most recent stable release from one of Apache's Mirror servers.
to run the Hadoop cluster
Unzip the downloaded Hadoop release. Editing the conf/hadoop-env.sh file requires at least the java_home to be set to the Java installation root path.
Try the following command:
$ bin/hadoop
The use document for the Hadoop script will be displayed.
Now you can start the Hadoop cluster in one of the following three supported modes:
The operation method of
single mode pseudo-distributed mode in complete distributed mode
By default, Hadoop is configured as a stand-alone Java process that runs in a non-distributed mode. This is very helpful for debugging.
The following example finds and displays an entry that matches a given regular expression, taking a copy of the uncompressed Conf directory as input. The output is written to the specified output directory.
$ mkdir Input
$ CP Conf/*.xml Input
$ bin/hadoop jar hadoop-*-examples.jar grep input Output ' dfs[a-z.] +'
$ cat output/*
The operation method of
pseudo-distributed mode
Hadoop can be run on a single node in so-called pseudo distributed mode, at which point every Hadoop daemon runs as a separate Java process.
Configuration
Use the following conf/hadoop-site.xml:
<configuration> <property> <name>fs.default.name</name> <value>localhost:9000</ value> </property> <property> <name>mapred.job.tracker</name> <value>localhost :9001</value> </property> <property> <name>dfs.replication</name> <value>1< /value> </property></configuration> Password-free SSH settings
Now confirm that you can login localhost with ssh without entering a password:
$ ssh localhost
If you do not enter a password to use SSH login localhost, execute the following command:
$ ssh-keygen-t dsa-p '-F ~/.SSH/ID_DSA
$ cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys
Executive
Format a new Distributed File system:
$ bin/hadoop Namenode-format
To start the Hadoop daemon:
$ bin/start-all.sh
The log of the HADOOP daemon is written to the ${hadoop_log_dir} directory (default is ${hadoop_home}/logs).
Browse Namenode and Jobtracker network interfaces by default:
namenode-http://localhost:50070/jobtracker-http://localhost:50030/
To copy an input file to a distributed File system:
$ bin/hadoop fs-put conf input
To run the sample program provided by the release:
$ bin/hadoop jar hadoop-*-examples.jar grep input Output ' dfs[a-z.] +'
To view the output file:
Copy the output file from the Distributed file system to the local file system view:
$ bin/hadoop fs-get Output output
$ cat output/*
Or
To view the output file on a distributed File system:
$ bin/hadoop Fs-cat output/*
After you complete the operation, stop the daemon:
$ bin/stop-all.sh
The operation method of
fully distributed mode
Information on a meaningful cluster of fully distributed patterns can be found here.
Java and JNI are trademarks or registered trademarks of Sun Microsystems, Inc. in the United States and other countries.