Here, we will first learn about the relationship between MapReduce and YARN? A: YARN is not the next generation MapReduce (MRv2). The next generation MapReduce and the first generation MapReduce (MRv1) are exactly the same in programming interfaces and Data Processing engines (MapTask and ReduceTask, we can think that MRv2 has reused these
Here, we will first learn about the relationship between MapReduce and YARN? A: YARN is not the next generation MapReduce (MRv2). The next generation MapReduce and the first generation MapReduce (MRv1) are exactly the same in programming interfaces and Data Processing engines (MapTask and ReduceTask, we can think that MRv2 has reused these
Here we will first learn about a problem:
What is the relationship between MapReduce and YARN?
A: YARN is not the next generation MapReduce (MRv2). The next generation MapReduce and the first generation MapReduce (MRv1) are exactly the same in programming interfaces and Data Processing engines (MapTask and ReduceTask, we can think that MRv2 has reused these modules of MRv1. The difference is the resource management and Job Management System. In MRv1, resource management and job management are all implemented by JobTracker, which integrates two functions, in MRv2, the two parts are separated. Job Management is implemented by ApplicationMaster, and resource management is completed by the new system YARN. Because YARN is universal, therefore, YARN can also be used as a resource management system for other computing frameworks, not limited to MapReduce, but also other computing frameworks, such as Spark and Storm. Generally, we generally call the computing framework running on YARN "X On YARN", such as "MapReduce On YARN", "Spark On YARN", and "Storm on YARN.
Hadoop 2.0 consists of three subsystems: HDFS, YARN, and MapReduce. YARN is a brand new resource management system, while MapReduce is only an application running on YARN, if YARN is regarded as a cloud operating system, MapReduce can be considered as an App running on this operating system.
23:41:22
What is the relationship between MapReduce and YARN written last time? Today, we will officially build the environment.
Environment preparation: refer to the first step to the sixth step in the article "Building a Hadoop-0.20.2 Environment"
System: Ubuntu-12.04 (available for other versions)
Mode: pseudo-distributed
Build user: hadoop
Hadoop-2.2.0: http://mirrors.hust.edu.cn/apache/hadoop/common/hadoop-2.2.0/
Choose your installation package. Here we choose hadoop-2.2.0.tar.gz.
Hadoop image link: http://www.apache.org/dyn/closer.cgi/hadoop/common/
Statement 1: The directory where I configure the hadoop-2.2.0 is/home/hadoop
Statement 2: The yarn directory hadoop-2.2.0 directory and hadoop data directory are created under/home/hadoop under the yarn directory.
Statement 3: replace/home/hadoop with your own directory during the following construction process.
Step 1: Upload hadoop-2.2.0.tar.gz and decompress it to the/home/hadoop/yarn directory. In this case, extract the hadoop-2.2.0 directory from the yarn directory.
sudo chown -R hadoop:hadoop hadoop-2.2.0
Create a Hadoop data directory:
mkdir -p /home/hadoop/yarn/yarn_data/hdfs/namenodemkdir -p /home/hadoop/yarn/yarn_data/hdfs/datanode
Before the configuration file, let's take a general look at the various folders in the hadoop-2.2.0 directory, pay attention to the distinction and the changes in Hadoop1.
The outer STARTUP script is in the sbin directory.
The script called in the inner layer is in the bin directory.
The so files of Native are all in the lib/native directory.
All configuration files are stored in libexec
The configuration files are all in the etc directory, corresponding to the conf directory of previous versions
All jar packages are under the share/hadoop directory.
Step 2: Configure Environment Variables
Here I did not render the environment myself, so no system environment is configured in the hadoop-2.2.0/etc/profile
If configured, execute source/etc/profile to make it take effect.
Step 3: core-site.xml hdfs-site.xml mapred-site.xml yarn-site.xml Configuration
The specific configuration is in the/home/hadoop/yarn/hadoop-2.2.0/etc/hadoop directory.
Core-site.xml Configuration
Fs. default. name
Hdfs: // localhost: 9000
Specify the IP address and port number of the NameNode
Hdfs-site.xml
Dfs. replication
2
Number of backups
Dfs. namenode. name. dir
File:/home/hadoop/yarn/yarn_data/hdfs/namenode
Dfs. datanode. data. dir
File:/home/hadoop/yarn/yarn_data/hdfs/datanode
Mapred-site.xml
mapreduce.framework.name
yarn
mapreduce.jobhistory.address
localhost:10020
mapreduce.jobhistory.webapp.address
localhost:19888
Yarn-site.xml
yarn.resourcemanager.address
localhost:8032
yarn.resourcemanager.scheduler.address
localhost:8030
yarn.resourcemanager.resource-tracker.address
localhost:8031
yarn.resourcemanager.admin.address
localhost:8033
yarn.resourcemanager.webapp.address
localhost:8088
yarn.nodemanager.aux-services
mapreduce_shuffle
yarn.nodemanager.aux-services.mapreduce.shuffle.class
org.apache.hadoop.mapred.ShuffleHandler
Step 4: Configure slaves
Because it is pseudo-distributed, we only have localhost
Step 5: Synchronize the configured hadoop-2.2.0 distribution to each data node
This step is skipped because it is pseudo-distributed.
Step 6: Format NameNode
Run the following command:
bin/hdfs namenode –format
Or
bin/hadoop namenode –format
Step 7: START hdfs and yarn
Start hdfs:
sbin/start-dfs.sh
Start yarn:
sbin/start-yarn.sh
Or you can execute
sbin/start-all.sh
Start hdfs and yarn together.
Start the history service. Otherwise, the history link cannot be opened on the panel.
sbin/mr-jobhistory-daemon.sh start historyserver
Run the following jps command to view the startup process:
4504 ResourceManager4066 DataNode4761 NodeManager5068 JobHistoryServer4357 SecondaryNameNode3833 NameNode5127 Jps
Step 8: Test
Hdfs testing:
Create a file in hdfs: bin/hadoop fs-mkdir/wordcount upload the file to hdfs: bin/hadoop fs/home/hadoop/file2.txt/wordcount to view the hdfs file directory: hdfs dfs-ls/
Yarn test: run the WordCount test program,
bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.2.0.jar wordcount /wordcount /output2
View the results:
bin/hadoop fs -cat /output2/*
Result:
hadoop 1hello 2java 4jsp 1
Here, the hadoop-2.2.0 environment is set up, the configuration file according to specific needs, specific configuration. There may be some improper configuration. If you see it, you may still want to correct it.
Address: Build yarn (hadoop-2.2.0) Environment detailed process, thanks to the original author to share.