platforms supported by 1.hadoop:
- The Gnu/linux platform is a development and production platform. Hadoop has been proven to be more than 2000 nodes on the Gnu/linux platform.
- Win32 is a development platform, and distributed operations are not well tested on Win32 systems, so it is not used as a production environment.
2. Install the required software for Hdoop:
Software required for installing Hadoop under Linux and Windows:
2.1 The 1.6 jdk downloaded from the Sun website must be installed.
2.2SSH must be installed and the SSH protocol must be used by Hadoop scripts to manage the remote Hadoop process.
In a 2.3windows environment, additional software installation is required: Cygwin-the shell environment that must be installed to run the above software.
3. Install the Software:
If your cluster does not have the necessary software, then you must install them
The following commands are executed on Unbuntu Linux:
$ sudo apt-get install SSH
$ sudo apt-get install rsync
In the Windows environment, if you install Cywin, you do not need to install the above software, only need to install Cygwin when the relevant package can be selected.
Openssh-theNetCategory4. Download Hadoop Address: HTTP://HADOOP.APACHE.ORG/RELEASES.HTML5. Prepare to start Hadoop cluster: Unzip the downloaded Hadoop package, and in the package, edit the conf/ The hadoop-env.sh file in which the java_home is defined.
Try the command:
$ bin/hadoop
It will show you how to use Hadoop scripts.
Now you will be installing one of the three Hadoop support modes, local installation:
- Local (single node) mode
- Pseudo distribution Mode
- Distributed installation mode
6. Single-node installation: By default, Hadoop is configured to run not distributed mode as a separate Java process. This mode is useful for debugging purposes.
This sample program is a Hadoop-brought file that copies the XML files under the Conf to the input directory and finds and displays all rows that match the regular expression of the last parameter, output folder
$ mkdir Input
$ CP Conf/*.xml Input
$ bin/hadoop jar hadoop-examples-*.jar grep input Output ' dfs[a-z. +
$ cat output/*
So, the local installation is complete!
7. Pseudo-Distributed installation
Hadoop can also run on a separate node in pseudo-distributed mode, where each Hadoop process runs in a separate Java process.
7.1 Configuration:
Use the following configuration:
Conf/core-site.xml: <configuration> <property> <name>fs.default.name</name> <value>hdfs ://localhost:9000</value>
<description>localhost machine name </description> </property></configuration> to switch cost
Conf/hdfs-site.xml:
<configuration> <property> <name>dfs.replication</name> <value>1< /value> </property></configuration>
Conf/mapred-site.xml:
<configuration> <property> <name>mapred.job.tracker</name> <value> Localhost:9001</value> </property></configuration>
|
7.2 Installing SSH can now test the local password-free login:
$ ssh localhost
If you are unable to log on locally, you can regenerate the SSH key by performing the following chitian:
$ ssh-keygen-t Dsa-p "-F ~/.SSH/ID_DSA
$ cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys
7.3 Execution:
To format a new Distributed File system:
$ bin/hadoop Namenode-format
Start the Hadoop process:
$ bin/start-all.sh
The log output directory for the HADOOP process is: ${hadoop_log_dir} directory (defaults to ${hadoop_home}/logs).
Browse the Namenode and Jobtracker states via the Web page, by default their access address is:
- namenode-http://localhost:50070/
- jobtracker-http://localhost:50030/
7.4 Testing Hadoop:
Copy the files in input to the distributed system:
$ bin/hadoop fs-put conf input
The run provides some examples:
$ bin/hadoop jar hadoop-examples-*.jar grep input Output ' dfs[a-z. +
Check the output:
Copy the files in the output from the distributed system to the local directory and detect them:
$ bin/hadoop fs-get Output output
$ cat output/*
Or
To view the distributed directory under output:
$ bin/hadoop Fs-cat output/*
To stop a process:
$ bin/stop-all.sh
At this point, the pseudo-distributed installation of Hadoop is complete.
Hadoop pseudo-distributed Installation "translated from hadoop1.1.2 official documents"