Detailed process of constructing yarn (hadoop-2.2.0) Environment

Last Update:2018-06-12 Source: Internet

Author: User

Tags hdfs dfs hadoop fs

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Here, we will first learn about the relationship between MapReduce and YARN? A: YARN is not the next generation MapReduce (MRv2). The next generation MapReduce and the first generation MapReduce (MRv1) are exactly the same in programming interfaces and Data Processing engines (MapTask and ReduceTask, we can think that MRv2 has reused these

Here we will first learn about a problem:

What is the relationship between MapReduce and YARN?

A: YARN is not the next generation MapReduce (MRv2). The next generation MapReduce and the first generation MapReduce (MRv1) are exactly the same in programming interfaces and Data Processing engines (MapTask and ReduceTask, we can think that MRv2 has reused these modules of MRv1. The difference is the resource management and Job Management System. In MRv1, resource management and job management are all implemented by JobTracker, which integrates two functions, in MRv2, the two parts are separated. Job Management is implemented by ApplicationMaster, and resource management is completed by the new system YARN. Because YARN is universal, therefore, YARN can also be used as a resource management system for other computing frameworks, not limited to MapReduce, but also other computing frameworks, such as Spark and Storm. Generally, we generally call the computing framework running on YARN "X On YARN", such as "MapReduce On YARN", "Spark On YARN", and "Storm on YARN.

Hadoop 2.0 consists of three subsystems: HDFS, YARN, and MapReduce. YARN is a brand new resource management system, while MapReduce is only an application running on YARN, if YARN is regarded as a cloud operating system, MapReduce can be considered as an App running on this operating system.

23:41:22

What is the relationship between MapReduce and YARN written last time? Today, we will officially build the environment.

Environment preparation: refer to the first step to the sixth step in the article "Building a Hadoop-0.20.2 Environment"

System: Ubuntu-12.04 (available for other versions)

Mode: pseudo-distributed

Build user: hadoop

Hadoop-2.2.0: http://mirrors.hust.edu.cn/apache/hadoop/common/hadoop-2.2.0/
Choose your installation package. Here we choose hadoop-2.2.0.tar.gz.
Hadoop image link: http://www.apache.org/dyn/closer.cgi/hadoop/common/

Statement 1: The directory where I configure the hadoop-2.2.0 is/home/hadoop
Statement 2: The yarn directory hadoop-2.2.0 directory and hadoop data directory are created under/home/hadoop under the yarn directory.
Statement 3: replace/home/hadoop with your own directory during the following construction process.

Step 1: Upload hadoop-2.2.0.tar.gz and decompress it to the/home/hadoop/yarn directory. In this case, extract the hadoop-2.2.0 directory from the yarn directory.

sudo chown -R hadoop:hadoop hadoop-2.2.0

Create a Hadoop data directory:

mkdir -p /home/hadoop/yarn/yarn_data/hdfs/namenodemkdir -p /home/hadoop/yarn/yarn_data/hdfs/datanode

Before the configuration file, let's take a general look at the various folders in the hadoop-2.2.0 directory, pay attention to the distinction and the changes in Hadoop1.

The outer STARTUP script is in the sbin directory.

The script called in the inner layer is in the bin directory.

The so files of Native are all in the lib/native directory.

All configuration files are stored in libexec

The configuration files are all in the etc directory, corresponding to the conf directory of previous versions

All jar packages are under the share/hadoop directory.

Step 2: Configure Environment Variables

Here I did not render the environment myself, so no system environment is configured in the hadoop-2.2.0/etc/profile
If configured, execute source/etc/profile to make it take effect.

Step 3: core-site.xml hdfs-site.xml mapred-site.xml yarn-site.xml Configuration

The specific configuration is in the/home/hadoop/yarn/hadoop-2.2.0/etc/hadoop directory.

Core-site.xml Configuration


      
           
    
     
Fs. default. name
            
    
     
Hdfs: // localhost: 9000
            
    
     
Specify the IP address and port number of the NameNode

Hdfs-site.xml


      
           
    
     
Dfs. replication
            
    
     
2
            
    
     
Number of backups
        
       
           
    
     
Dfs. namenode. name. dir
            
    
     
File:/home/hadoop/yarn/yarn_data/hdfs/namenode
        
       
           
    
     
Dfs. datanode. data. dir
            
    
     
File:/home/hadoop/yarn/yarn_data/hdfs/datanode

Mapred-site.xml


      
            
    
     mapreduce.framework.name
             
    
     yarn
         
       
           
    
     mapreduce.jobhistory.address
            
    
     localhost:10020
        
       
           
    
     mapreduce.jobhistory.webapp.address
            
    
     localhost:19888

Yarn-site.xml


  
   
   
    　　
    
     yarn.resourcemanager.address
    　　
    
     localhost:8032
    
   
   
    　　
    
     yarn.resourcemanager.scheduler.address
    　　
    
     localhost:8030
    
   
   
    　　
    
     yarn.resourcemanager.resource-tracker.address
    　　
    
     localhost:8031
    
      
   
    　　
    
     yarn.resourcemanager.admin.address
    　　
    
     localhost:8033
    
      
   
    　　
    
     yarn.resourcemanager.webapp.address
    　　
    
     localhost:8088
    
     
    
    
     yarn.nodemanager.aux-services
     
    
     mapreduce_shuffle
     
      
    
    
     yarn.nodemanager.aux-services.mapreduce.shuffle.class
     
    
     org.apache.hadoop.mapred.ShuffleHandler

Step 4: Configure slaves

Because it is pseudo-distributed, we only have localhost

Step 5: Synchronize the configured hadoop-2.2.0 distribution to each data node

This step is skipped because it is pseudo-distributed.

Step 6: Format NameNode

Run the following command:

bin/hdfs namenode –format

bin/hadoop namenode –format

Step 7: START hdfs and yarn

Start hdfs:

sbin/start-dfs.sh

Start yarn:

sbin/start-yarn.sh

Or you can execute

sbin/start-all.sh

Start hdfs and yarn together.

Start the history service. Otherwise, the history link cannot be opened on the panel.

sbin/mr-jobhistory-daemon.sh start historyserver

Run the following jps command to view the startup process:

4504 ResourceManager4066 DataNode4761 NodeManager5068 JobHistoryServer4357 SecondaryNameNode3833 NameNode5127 Jps

Step 8: Test

Hdfs testing:

Create a file in hdfs: bin/hadoop fs-mkdir/wordcount upload the file to hdfs: bin/hadoop fs/home/hadoop/file2.txt/wordcount to view the hdfs file directory: hdfs dfs-ls/

Yarn test: run the WordCount test program,

bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.2.0.jar wordcount /wordcount /output2

View the results:

bin/hadoop fs -cat /output2/*

Result:

hadoop  1hello   2java    4jsp 1

Here, the hadoop-2.2.0 environment is set up, the configuration file according to specific needs, specific configuration. There may be some improper configuration. If you see it, you may still want to correct it.

Address: Build yarn (hadoop-2.2.0) Environment detailed process, thanks to the original author to share.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More