Hadoop open source software and ecosystem

Source: Internet
Author: User
Tags hadoop ecosystem

Hadoop open source software and ecosystem: the direction of Hadoop operations, Hadoop development according to user specifications or open source software to do two times development.

Cloud computing and Big data: Narrow cloud computing and generalized cloud computing; three-tier model;

The origins of Hadoop: Doug cutting,google Core Technology,

Google vs Hadoop

Features of Hadoop: the support of the open source community, the backup and recovery mechanism of distributed file systems and the task monitoring of MapReduce ensure the reliability of distributed processing, and its framework can be run on any ordinary PC, Whether the scalable or scalable storage is the fundamental design of Hadoop, the efficient data interaction implementation of distributed file systems, and the processing mode of Localdata, which is combined with mapreduce, is the basis for efficient processing of large amounts of information.

Introduction to Hadoop Architecture: the Hadoop kernel: The HDFs component, the MapReduce component, the common component, the common component is the Hadoop foundation that provides some features such as Hadoop io, compression, RPC communication, serialization, and The common component can use the Jni method to invoke the native library written by C + +, accelerate data compression, data validation, etc. HDFS uses streaming data access mechanism, can be used to store large files, HDFs cluster has two kinds of nodes, name node Namenode, Data node Datanode, the name node holds the image information of the file data block and the namespace of the entire file system in memory, and the data node is responsible for storing and reading the data files. The HDFs component, the MapReduce component (Jobtracker-tasktracker-maptask,reducetask,word count application), and the execution process of MapReduce.

Hadoop Ecosystem:

Hadoop release: Cloudera cdh,hortonworks hdp,intel distribution,ibm biginsight. Solve the tedious dependencies and so on.

Hadoop version Selection: Hadoop 1.0, 2.0, where 1.0 contains 0.20.x,0.21.x, 0.22.x, where 0.20.x finally evolved to 1.0.x, the latter two added Namenode ha and other important features. The Hadoop2.0 version is 0.23.x,2.x, which differs from hadoop1.0, and is a new architecture that includes HDFs Federation and yarn Two systems, with 0.23.x,2.x HA added compared to Namenode, The characteristics of the wire-compatibility.

Hadoop open source software and ecosystem

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.