Objective 
The purpose of this document is to help you quickly complete the Hadoop installation and use on a single computer so that you can experience the Hadoop Distributed File System (HDFS) and the map-reduce framework, such as running sample programs or simple jobs on HDFS.
 Prerequisite Support Platform GNU is a platform for product development and operation. Hadoop has been validated on a clustered system consisting of 2000-node GNU hosts. The WIN32 platform is supported as a development platform. Since distributed operations have not been fully tested on the Win32 platform, they are not supported as a production platform. Required Software 
The Linux and Windows required software include:
 javatm1.5.x, must be installed, the Java version of Sun released is recommended. SSH must be installed and guaranteed to run sshd to manage the remote Hadoop daemon with a Hadoop script. 
Additional software requirements under Windows
 Cygwin-Provides shell support beyond the above software. Install software 
If your cluster does not have the required software installed, you must first install them.
 
Take Ubuntu Linux for example:
 
$ sudo apt install ssh
$ sudo apt install rsync
 
On the Windows platform, if you install Cygwin without all the required software installed, you need to start Cyqwin Setup Manager to install the following package:
 openssh-net class Download 
To get the release of Hadoop, download the most recent stable release from one of Apache's Mirror servers.
 to run the Hadoop cluster 
Unzip the downloaded Hadoop release. Editing the conf/hadoop-env.sh file requires at least the java_home to be set to the Java installation root path.
 
Try the following command:
$ bin/hadoop
The use document for the Hadoop script will be displayed.
 
Now you can start the Hadoop cluster in one of the following three supported modes:
The operation method of 
 single mode pseudo-distributed mode in complete distributed mode 
By default, Hadoop is configured as a stand-alone Java process that runs in a non-distributed mode. This is very helpful for debugging.
 
The following example finds and displays an entry that matches a given regular expression, taking a copy of the uncompressed Conf directory as input. The output is written to the specified output directory.
$ mkdir Input
$ CP Conf/*.xml Input
$ bin/hadoop jar hadoop-*-examples.jar grep input Output ' dfs[a-z.] +'
$ cat output/*
The operation method of 
 pseudo-distributed mode 
Hadoop can be run on a single node in so-called pseudo distributed mode, at which point every Hadoop daemon runs as a separate Java process.
 Configuration 
Use the following conf/hadoop-site.xml:
<configuration> <property> <name>fs.default.name</name> <value>localhost:9000</ value> </property> <property> <name>mapred.job.tracker</name> <value>localhost :9001</value> </property> <property> <name>dfs.replication</name> <value>1< /value> </property></configuration> Password-free SSH settings 
Now confirm that you can login localhost with ssh without entering a password:
$ ssh localhost
 
If you do not enter a password to use SSH login localhost, execute the following command:
$ ssh-keygen-t dsa-p '-F ~/.SSH/ID_DSA
$ cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys
 Executive 
Format a new Distributed File system:
$ bin/hadoop Namenode-format
 
To start the Hadoop daemon:
$ bin/start-all.sh
 
The log of the HADOOP daemon is written to the ${hadoop_log_dir} directory (default is ${hadoop_home}/logs).
 
Browse Namenode and Jobtracker network interfaces by default:
 namenode-http://localhost:50070/jobtracker-http://localhost:50030/ 
To copy an input file to a distributed File system:
$ bin/hadoop fs-put conf input
 
To run the sample program provided by the release:
$ bin/hadoop jar hadoop-*-examples.jar grep input Output ' dfs[a-z.] +'
 
To view the output file:
 
Copy the output file from the Distributed file system to the local file system view:
$ bin/hadoop fs-get Output output
$ cat output/*
 
Or
 
To view the output file on a distributed File system:
$ bin/hadoop Fs-cat output/*
 
After you complete the operation, stop the daemon:
$ bin/stop-all.sh
The operation method of 
 fully distributed mode 
Information on a meaningful cluster of fully distributed patterns can be found here.
 
Java and JNI are trademarks or registered trademarks of Sun Microsystems, Inc. in the United States and other countries.