1. hadoop version Introduction
Configuration files earlier than version 0.20.2 (excluding this version) are in default. xml.
Versions later than 0.20.x do not include jar packages with Eclipse plug-ins. Because eclipse versions are different, you need to compile the source code to generate the corresponding plug-ins.
0.20.2 -- 0.22.x configuration files are concentrated inConf/core-site.xml, Conf/hdfs-site.xmlAndConf/mapred-site.xml..
In version 0.23.x, Yarn Technology is added., Configuration files in CONF/core-site.xml, CONF/hdfs-site.xml,Conf/yarn-site.xmlAnd CONF/mapred-site.xml. These 4 files.
As version 0.23.x has changed a lot, new technologies have been added, making it difficult for many hadoop-based plug-ins to be compatible. For example, hive, hbase, pig, and so on are all based on versions earlier than version 0.23.x.
Therefore, Apache starts to unify the version number so that hadoop functions can be distinguished from the version number.
0.22.x upgrade directly to 1.0.0
0.23.x upgrade directly to 2.0.0
In this way, hadoop is divided into two versions: 1 and 2.
Version 1: It is mainly based on the upgrade and development of the original technology and supports other technologies. If you want to use hbase, hive, and other technologies, you only need to select version 1.
Version 2: It is mainly based on the promotion and development of new technologies. If it is only based on hadoop development, this is a good choice.
Description of hadoop download on the official website:
Download
- 1.2.x-Current stable version, 1.2 Release
- 2.4.x-Current stable 2.x version
- 0.23.x-Similar to 2. x. x but missing NN ha.
2. hadoop installation and Mode
Currently, I am using hadoop-0.20.2 in the lab environment, so later I will describe based on this version.
The configuration of each hadoop component is under the conf folder. Early hadoop used a configuration file hadoop-site.xml to configure the common, HDFS and mapreduce components, starting from version 0.20.0, divided into three files.
Core-site.xml: Configure properties for the common component.
Hdfs-site.xml: Configure HDFS properties.
Mapred-sit.xml: Configure mapreduce properties.
2.1 hadoop Running Mode
Hadoop runs in the following modes:
Standalone or local mode ):No daemon is required. All programs are executed on a single JVM. It is mainly used in the development stage. The default attribute is set for this mode, so no additional configuration is required.
Pseudo-distributed mode(Pseudo-distributed model): The hadoop daemon runs on a local machine to simulate a small-scale cluster.
Full Distribution Mode(Full distributed model): The hadoop daemon runs on a cluster.
Key configuration attributes of different modes
Component name |
Attribute name |
Independent Mode |
Pseudo Distribution Mode |
Full Distribution Mode |
Common |
FS. Default. Name |
File :///(Default) |
HDFS: // localhost: 9000 |
HDFS: // namenode: 9000 |
HDFS |
DFS. Replication |
N/ |
1 |
3 (default) |
Mapreduce |
Mapred. Job. Tracker |
Local (default) |
Localhost: 9001 |
Jobtracker: 9001 |
2.2 local Installation
Because the default attribute is set in this mode, and no daemon is requiredDFS. Replication value is set to 1No other operations are required.
Test:
Go to the $ hadoop_home directory and run the following command to test whether the installation is successful.
$ mkdir input $ cp conf/*.xml input $ bin/hadoop jar hadoop-examples-*.jar grep input output ‘dfs[a-z.]+‘ $ cat output/*
Output:
1 dfsadmin
After the above steps, if there is no error, the installation is successful.
2.3 installation steps in pseudo-distributed mode
Installation steps:
1. Set environment variables (java_home, path, hadoop_home, classpath)
2. Modify hadoop configuration files (core-site.xml, hdfs-site.xml, mapred-site.xml)
3. Set SSH password-less Login
4. format the file system hadoop namenode-format
5. Start the daemon start-all.sh.
6. Stop the daemon process
Step 2:
<configuration> <property> <name>fs.default.name</name> <value>localhost:9000</value> </property> <property> <name>mapred.job.tracker</name> <value>localhost:9001</value> </property> <property> <name>dfs.replication</name> <value>1</value> </property> </configuration>
After startup, you can view the namenode and jobtracker statuses on the webpage.
Namenode-http: // localhost: 50070/
Jobtracker-http: // localhost: 50030/
Test:
Copy a file to a Distributed File System
$ bin/hadoop fs -put conf input
Run the test
$ bin/hadoop jar hadoop-examples-*.jar grep input output ‘dfs[a-z.]+‘
Obtain the execution result of the test program
$ bin/hadoop fs -cat output/*
Output:
3 dfs.class 2 dfs.period 1 dfs.file 1 dfs.replication 1 dfs.servers 1 dfsadmin 1 dfsmetrics.log
After the above steps, if there is no error, the installation is successful.
2.4 installation steps in full distribution mode
Installation steps:
1. Set environment variables (java_home, path, hadoop_home, classpath)
2. Modify hadoop configuration files (core-site.xml, hdfs-site.xml, mapred-site.xml, masters, slaves)
3. Set SSH password-less Login
4. format the file system hadoop namenode-format
5. Start the daemon start-all.sh.
6. Stop the daemon process
After startup, you can view the namenode and jobtracker statuses on the webpage.
Namenode-http: // namenode: 50070/
Jobtracker-http: // jobtracker: 50030/
Note:
Install hadoop in the same location on each machine with the same user name.
3. Install the Eclipse plug-in
Eclipse hadoop plug-in provides
Mapreduce location view, used to set mapreduce variables;
Windows-> Preferences adds the settings column for setting the hadoop installation location;
InAdded DFS locations in the project category E view.Project to view the content of the HDFS file system and upload and download files;
Mapreduce project is added to the new project;
AddedRun on hadoopPlatform features.
It should be noted that the contrib \ eclipse-plugin \ hadoop-0.20.2-eclipse-plugin.jar of hadoop is out of date, and a new one needs to be downloaded from the Internet, otherwise there is no response when running mapreduce program.
hadoop In The Big Data era (1): hadoop installation