Installation configuration on Apache Hadoop single node
Here's a quick walkthrough of Hadoop installation and configuration on a single node, so you can get a feel for Hadoop HDFS and the MapReduce framework.
- Prerequisite
Supported Platforms:
Gnu/linux: It has been proven that Hadoop can support 2000 nodes on the Gnu/linux platform.
Windows. The examples shown in this article are all running on the Gnu/linux platform, and if you are running in Windows, you can refer to http://wiki.apache.org/hadoop/Hadoop2OnWindows.
Required Software:
Java must be installed. For Hadoop 2.7 and later, Java 7 needs to be installed, either OpenJDK or Oracle (HotSpot) jdk/jre. Other versions of JDK requirements can be found in http://wiki.apache.org/hadoop/HadoopJavaVersions;
SSH must be installed and guaranteed to run sshd to manage the remote Hadoop daemon with Hadoop scripts. Here is an example of an installation on Ubuntu:
$ sudo apt-get install SSH
$ sudo apt-get install rsync
1
2
- Download
In http://www.apache.org/dyn/closer.cgi/hadoop/common/.
- Preparing to run a Hadoop cluster
Unzip the downloaded Hadoop release. To edit the etc/hadoop/hadoop-env.sh file, define the following parameters:
Setting up the Java installation directory
Export Java_home=/usr/java/latest
1
2
Try the following command:
$ bin/hadoop
1
The usage documentation for the Hadoop script will be displayed.
Now you can start the Hadoop cluster in one of the following three supported modes:
Local (standalone) mode
Pseudo-distributed mode
Fully distributed mode
- How to operate the standalone mode
By default, Hadoop is configured as a standalone Java process that runs in non-distributed mode. This is very helpful for debugging.
The following example finds and displays an entry that matches a given regular expression by taking the extracted conf directory copy as input. The output is written to the specified output directory.
$ mkdir Input
$ CP etc/hadoop/. XML input
$ bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.3.jar grep input Output ' dfs[a-z. +
$ cat output/
1
2
3
4
- Operation method of Pseudo-distributed mode
Hadoop can run in so-called pseudo-distributed mode on a single node, where each Hadoop daemon runs as a standalone Java process.
Configuration
Use the following:
Etc/hadoop/core-site.xml:
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>
1
2
3
4
5
6
Etc/hadoop/hdfs-site.xml:
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>
Interested can continue to see the next chapter
Many people know that I have big data training materials, all naïve thought I have a full set of big data development, Hadoop, spark and other video learning materials. I want to say that you are right, I do have big data development, Hadoop, Spark's full set of video materials.
If you are interested in big data development You can add a group to receive free learning materials: 763835121
Apache Hadoop Getting Started Tutorial chapter II