1. Introduction to Hadoop version
Configuration files that were previously in the 0.20.2 version (without this version) are in Default.xml.
The 0.20.x version does not contain the Eclipse plug-in jar package, because the eclipse version is different, so you need to compile the source code to generate the corresponding plug-in.
The 0.20.2--0.22.x version of the configuration file is centralized in conf/core-site.xml, conf/hdfs-site.xml , and conf/mapred-site.xml . In .
0.23.x version has added yarn technology , the profile is centralized in Conf/core-site.xml, Conf/hdfs-site.xml, conf/yarn-site.xml and conf/ Mapred-site.xml. On these 4 files.
Due to the large changes in the 0.23.x version, new technologies have been added to make many Hadoop-based plug-ins difficult to be compatible with, such as Hive, hbase, pig, etc. based on previous versions of 0.23.x.
So Apache begins to unify the version number so that it can differentiate the functionality of Hadoop from the version number.
0.22.x Direct upgrade to 1.0.0
0.23.x Direct upgrade to 2.0.0
This divides hadoop into two versions of 1 and 2
Version 1: Mainly based on the original technology upgrade and development, while supporting the support of other technologies. If you want to use HBase, hive and other technologies, only select version 1
2 version: Based on the promotion and development of new technology, this is a good choice if it is only based on Hadoop development.
Currently the official Web download Hadoop Description:
Download
- 1.2.X- Current stable version, 1.2 release
- 2.4.X- current Stable 2.x version
- 0.23.X- similar to 2.x.x but missing NN HA.
2. Hadoop Installation and Mode
At the moment, I'm using hadoop-0.20.2 in the lab environment, so I'm going to describe it later based on this version.
The configuration of each component of Hadoop is under folder Conf. Early Hadoop used a configuration file Hadoop-site.xml to configure the Common,hdfs and MapReduce components, starting with the 0.20.0 release, divided into three files.
Core-site.xml: Configures the properties of the common component.
Hdfs-site.xml: Configures the HDFs property.
Mapred-sit.xml: Configures the MapReduce attribute.
2.1. Hadoop operating mode
There are three modes of operation of Hadoop:
Standalone mode (standalone or local mode): no daemon (daemon) is required, and all programs are executed on a single JVM. Mainly used in the development phase. The default property is set for this mode, so no additional configuration is required.
Pseudo-distributed mode (pseudo-distributed model): The Hadoop daemon runs on the local machine, simulating a small-scale cluster.
fully distributed mode (full distributed model): The Hadoop daemon runs on a cluster.
Different mode key Configuration properties
Component Name |
Property name |
Standalone mode |
Pseudo distribution Mode |
Full distribution Mode |
Common |
Fs.default.name |
file:/// Default |
hdfs://localhost:9000 |
hdfs://namenode:9000 |
Hdfs |
Dfs.replication |
N/A |
1 |
3 (default) |
Mapreduce |
Mapred.job.tracker |
Local (default) |
localhost:9001 |
jobtracker:9001 |
2.2. Native mode installation
Because the default properties are set specifically for this mode and do not need to run any daemons, this mode does not require any action other than setting the Dfs.replication value to 1 .
Test:
Go to the $hadoop_home directory and execute the following command to test whether the installation was successful
[Plain]View Plaincopyprint?
- $ mkdir Input
- $ CP Conf/*.xml Input
- $ bin/hadoop jar hadoop-examples-*.jar grep input Output ' dfs[a-z. +
- $ cat output/*
Output:
1 dfsadmin
After the above steps, if no error occurs, the installation succeeds.
2.3. Pseudo-distributed mode installation procedure
Installation steps:
1. Setting environment variables (java_home,path,hadoop_home,classpath)
2. Modify the Hadoop configuration file (core-site.xml,hdfs-site.xml,mapred-site.xml)
3, set up SSH login without password
4. Format File system Hadoop Namenode-format
5. Start the daemon process start-all.sh
6. Stop Daemon Process
The second step of the example:
[HTML]View Plaincopyprint?
- <configuration>
- <property>
- <name>fs.default.name</name>
- <value>localhost:9000</value>
- </Property>
- <property>
- <name>mapred.job.tracker</name>
- <value>localhost:9001</value>
- </Property>
- <property>
- <name>dfs.replication</name>
- <value>1</value>
- </Property>
- </configuration>
Namenode and Jobtracker status can be viewed via web page after launch
namenode-http://localhost:50070/
jobtracker-http://localhost:50030/
Test:
Copying files to a distributed file system
[Plain]View Plaincopyprint?
- $ bin/hadoop fs-put conf input
Run Tests
[Plain]View Plaincopyprint?
- $ bin/hadoop jar hadoop-examples-*.jar grep input Output ' dfs[a-z. +
Get the execution results of a test program
[Plain]View Plaincopyprint?
- $ bin/hadoop Fs-cat output/*
Output:
[Plain]View Plaincopyprint?
- 3 Dfs.class
- 2 Dfs.period
- 1 dfs.file
- 1 dfs.replication
- 1 dfs.servers
- 1 dfsadmin
- 1 Dfsmetrics.log
After the above steps, if no error occurs, the installation succeeds.
2.4. Full Distribution mode installation procedure
Installation steps:
1. Setting environment variables (java_home,path,hadoop_home,classpath)
2. Modify the Hadoop configuration file (core-site.xml,hdfs-site.xml,mapred-site.xml,masters,slaves)
3, set up SSH login without password
4. Format File system Hadoop Namenode-format
5. Start the daemon process start-all.sh
6. Stop Daemon Process
Namenode and Jobtracker status can be viewed via web page after launch
namenode-http://namenode:50070/
jobtracker-http://jobtracker:50030/
Attention:
Hadoop is installed in the same location on each machine, and the user name is the same.
3. Eclipse Plug-in installation
The Eclipse Hadoop plugin is designed to quickly develop a MapReduce program that provides
MapReduce location view for setting the MapReduce variable;
Windows->preferences added set the Hadoop installation location setting bar;
added DFS Locations project in Project Explore view , able to view the contents of HDFs file system and be able to upload downloaded files;
New project has added MapReduce project;
Added the run on Hadoop platform feature.
It is important to note that the Contrib\eclipse-plugin\hadoop-0.20.2-eclipse-plugin.jar of Hadoop comes out of date and needs to download a new one from the Internet, otherwise it does not respond when running the MapReduce program.
Hadoop in the Big Data era (i): Hadoop installation