HADOOP2 Source Analysis-hadoop V2 First Knowledge

Source: Internet
Author: User

1. Overview

After the completion of the analysis of the HADOOP2 source code preparation, we entered the follow-up of the source learning phase. This blog for everyone to share, so that we have a preliminary understanding of Hadoop V2, blog directory content is as follows:

    • Sources of Hadoop
    • Hadoop V2 Partial Project diagram
    • Introduction to the functions of each package

The source code of this article is based on Hadoop-2.6.0 to analyze, other versions of Hadoop source code can be used as a reference analysis.

The origin of 2.Hadoop

In fact, in the early years of Google's core competitiveness is its computing platform, Google published a paper on the content:

    • Googlecluster
    • Chubby
    • Gfs
    • BigTable
    • Mapreduce

While MapReduce is not a feature unique to Hadoop, the Apache Foundation has a similar project, which is affiliated with the Hadoop project, namely:

    • ZooKeeper (Chubby)
    • HDFS (GFS)
    • HBase (BigTable)
    • MapReduce (Hadoop is the collectively known as HDFs and MapReduce)

There are many open-source projects like this, such as Yahoo, which handles huge data with pig, and Facebook uses hive for user behavior analysis. The two core features of Hadoop, HDFs and Mapreduce,mapreduce, are a framework for offline computing that relies on HDFS,HDFS as a distributed file storage system and is the foundation for all of these projects. The support diagram for HDFs, as shown in:

  

3.Hadoop V2 Partial Project diagram

The relationship between Hadoop package and package dependency is more complex, the reason is that HDFS provides a distributed file storage system, which provides a large API, which makes the implementation of distributed file system bottom-up, relies on some high-level functions, which are referenced by each other and form a net dependency. For example, such as the Conf package, which is used to read the system configuration file, relies on the FS package, mainly read the corresponding configuration file, need to use to the file system, and some of the file system functions are abstracted in the FS package. The core part of the Hadoop V2 project is dependent on the package, as shown in:

The following chapters, which are mainly for you to share these parts, such as: Mapreduce,fs,hdfs,ipc,io,yarn.

4. Description of the functions of each package

The following describes the various packages listed below, the functions of each package are as follows:

    • Tools: Provides command-line tools, such as distcp,archive, and so on.
    • MapReduce V2:hadoop V2 version of the Map/reduce implementation.
    • Filecache: Let HDFs have a local cache to speed up Mr Data access.
    • The Distributed File system implementation of HDFs V2:hadoop V2.
    • FS: A file System Abstraction package that provides a uniform file access interface for supporting multiple file systems (and possibly other file systems).
    • IPC: relies on the encoding and decoding capabilities provided by IO.
    • IO: Encode and decode data for transmission over the network.
    • NET: Encapsulation of network functions, such as sockets.
    • CONF: Configures the system's parameters.
    • Util: Tool class.
    • HA: Configure a highly available cluster to have two Namenode (active and standby) for the cluster.
    • Yarn:hadoop the newly added features of the V2 version for resource scheduling and management.
5. Summary

Hadoop V2 in the underlying design compared to the Hadoop V1 is different, new ha, so that the single point of the Hadoop V1 problem has been well solved, Hadoop V2 new yarn system, so that the cluster resource management and scheduling more perfect, Greatly reduces the resource consumption of ResourceManager, and makes it more secure and graceful to distribute the programs that monitor the status of each job subtask (tasks). At the same time, a variety of computing frameworks can be run in a cluster.

6. Concluding remarks

This article and everyone to share here, if you are in the process of research and learning have any questions, you can add groups to discuss or send e-mail to me, I will do my best to answer for you, with June Mutual encouragement!

HADOOP2 Source Analysis-hadoop V2 First Knowledge

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.