Detailed process of constructing yarn (hadoop-2.2.0) Environment

Source: Internet
Author: User
Tags hdfs dfs hadoop fs
Here, we will first learn about the relationship between MapReduce and YARN? A: YARN is not the next generation MapReduce (MRv2). The next generation MapReduce and the first generation MapReduce (MRv1) are exactly the same in programming interfaces and Data Processing engines (MapTask and ReduceTask, we can think that MRv2 has reused these

Here, we will first learn about the relationship between MapReduce and YARN? A: YARN is not the next generation MapReduce (MRv2). The next generation MapReduce and the first generation MapReduce (MRv1) are exactly the same in programming interfaces and Data Processing engines (MapTask and ReduceTask, we can think that MRv2 has reused these

Here we will first learn about a problem:

What is the relationship between MapReduce and YARN?

A: YARN is not the next generation MapReduce (MRv2). The next generation MapReduce and the first generation MapReduce (MRv1) are exactly the same in programming interfaces and Data Processing engines (MapTask and ReduceTask, we can think that MRv2 has reused these modules of MRv1. The difference is the resource management and Job Management System. In MRv1, resource management and job management are all implemented by JobTracker, which integrates two functions, in MRv2, the two parts are separated. Job Management is implemented by ApplicationMaster, and resource management is completed by the new system YARN. Because YARN is universal, therefore, YARN can also be used as a resource management system for other computing frameworks, not limited to MapReduce, but also other computing frameworks, such as Spark and Storm. Generally, we generally call the computing framework running on YARN "X On YARN", such as "MapReduce On YARN", "Spark On YARN", and "Storm on YARN.

Hadoop 2.0 consists of three subsystems: HDFS, YARN, and MapReduce. YARN is a brand new resource management system, while MapReduce is only an application running on YARN, if YARN is regarded as a cloud operating system, MapReduce can be considered as an App running on this operating system.

23:41:22

What is the relationship between MapReduce and YARN written last time? Today, we will officially build the environment.

Environment preparation: refer to the first step to the sixth step in the article "Building a Hadoop-0.20.2 Environment"

System: Ubuntu-12.04 (available for other versions)

Mode: pseudo-distributed

Build user: hadoop

Hadoop-2.2.0: http://mirrors.hust.edu.cn/apache/hadoop/common/hadoop-2.2.0/
Choose your installation package. Here we choose hadoop-2.2.0.tar.gz.
Hadoop image link: http://www.apache.org/dyn/closer.cgi/hadoop/common/

Statement 1: The directory where I configure the hadoop-2.2.0 is/home/hadoop
Statement 2: The yarn directory hadoop-2.2.0 directory and hadoop data directory are created under/home/hadoop under the yarn directory.
Statement 3: replace/home/hadoop with your own directory during the following construction process.

Step 1: Upload hadoop-2.2.0.tar.gz and decompress it to the/home/hadoop/yarn directory. In this case, extract the hadoop-2.2.0 directory from the yarn directory.

sudo chown -R hadoop:hadoop hadoop-2.2.0

Create a Hadoop data directory:

mkdir -p /home/hadoop/yarn/yarn_data/hdfs/namenodemkdir -p /home/hadoop/yarn/yarn_data/hdfs/datanode

Before the configuration file, let's take a general look at the various folders in the hadoop-2.2.0 directory, pay attention to the distinction and the changes in Hadoop1.

The outer STARTUP script is in the sbin directory.

The script called in the inner layer is in the bin directory.

The so files of Native are all in the lib/native directory.

All configuration files are stored in libexec

The configuration files are all in the etc directory, corresponding to the conf directory of previous versions

All jar packages are under the share/hadoop directory.

Step 2: Configure Environment Variables

Here I did not render the environment myself, so no system environment is configured in the hadoop-2.2.0/etc/profile
If configured, execute source/etc/profile to make it take effect.

Step 3: core-site.xml hdfs-site.xml mapred-site.xml yarn-site.xml Configuration

The specific configuration is in the/home/hadoop/yarn/hadoop-2.2.0/etc/hadoop directory.

Core-site.xml Configuration


      
           
    
     
Fs. default. name
            
    
     
Hdfs: // localhost: 9000
            
    
     
Specify the IP address and port number of the NameNode
        
   
  

Hdfs-site.xml


      
           
    
     
Dfs. replication
            
    
     
2
            
    
     
Number of backups
        
       
           
    
     
Dfs. namenode. name. dir
            
    
     
File:/home/hadoop/yarn/yarn_data/hdfs/namenode
        
       
           
    
     
Dfs. datanode. data. dir
            
    
     
File:/home/hadoop/yarn/yarn_data/hdfs/datanode
        
   
  

Mapred-site.xml


      
            
    
     mapreduce.framework.name
             
    
     yarn
         
       
           
    
     mapreduce.jobhistory.address
            
    
     localhost:10020
        
       
           
    
     mapreduce.jobhistory.webapp.address
            
    
     localhost:19888
        
   
          

Yarn-site.xml


  
   
   
      
    
     yarn.resourcemanager.address
      
    
     localhost:8032
    
   
   
      
    
     yarn.resourcemanager.scheduler.address
      
    
     localhost:8030
    
   
   
      
    
     yarn.resourcemanager.resource-tracker.address
      
    
     localhost:8031
    
      
   
      
    
     yarn.resourcemanager.admin.address
      
    
     localhost:8033
    
      
   
      
    
     yarn.resourcemanager.webapp.address
      
    
     localhost:8088
    
     
    
    
     yarn.nodemanager.aux-services
     
    
     mapreduce_shuffle
     
      
    
    
     yarn.nodemanager.aux-services.mapreduce.shuffle.class
     
    
     org.apache.hadoop.mapred.ShuffleHandler
     
    
  

Step 4: Configure slaves

Because it is pseudo-distributed, we only have localhost

Step 5: Synchronize the configured hadoop-2.2.0 distribution to each data node

This step is skipped because it is pseudo-distributed.

Step 6: Format NameNode

Run the following command:

bin/hdfs namenode –format

Or

bin/hadoop namenode –format

Step 7: START hdfs and yarn

Start hdfs:

sbin/start-dfs.sh

Start yarn:

sbin/start-yarn.sh

Or you can execute

sbin/start-all.sh

Start hdfs and yarn together.

Start the history service. Otherwise, the history link cannot be opened on the panel.

sbin/mr-jobhistory-daemon.sh start historyserver

Run the following jps command to view the startup process:

4504 ResourceManager4066 DataNode4761 NodeManager5068 JobHistoryServer4357 SecondaryNameNode3833 NameNode5127 Jps

Step 8: Test

Hdfs testing:

Create a file in hdfs: bin/hadoop fs-mkdir/wordcount upload the file to hdfs: bin/hadoop fs/home/hadoop/file2.txt/wordcount to view the hdfs file directory: hdfs dfs-ls/

Yarn test: run the WordCount test program,

bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.2.0.jar wordcount /wordcount /output2

View the results:

bin/hadoop fs -cat /output2/*

Result:

hadoop  1hello   2java    4jsp 1

Here, the hadoop-2.2.0 environment is set up, the configuration file according to specific needs, specific configuration. There may be some improper configuration. If you see it, you may still want to correct it.

Address: Build yarn (hadoop-2.2.0) Environment detailed process, thanks to the original author to share.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.