Hadoop open source software and ecosystem: the direction of Hadoop operations, Hadoop development according to user specifications or open source software to do two times development.
Cloud computing and Big data: Narrow cloud computing and generalized cloud computing; three-tier model;
The origins of Hadoop: Doug cutting,google Core Technology,
Google vs Hadoop
Features of Hadoop: the support of the open source community, the backup and recovery mechanism of distributed file systems and the task monitoring of MapReduce ensure the reliability of distributed processing, and its framework can be run on any ordinary PC, Whether the scalable or scalable storage is the fundamental design of Hadoop, the efficient data interaction implementation of distributed file systems, and the processing mode of Localdata, which is combined with mapreduce, is the basis for efficient processing of large amounts of information.
Introduction to Hadoop Architecture: the Hadoop kernel: The HDFs component, the MapReduce component, the common component, the common component is the Hadoop foundation that provides some features such as Hadoop io, compression, RPC communication, serialization, and The common component can use the Jni method to invoke the native library written by C + +, accelerate data compression, data validation, etc. HDFS uses streaming data access mechanism, can be used to store large files, HDFs cluster has two kinds of nodes, name node Namenode, Data node Datanode, the name node holds the image information of the file data block and the namespace of the entire file system in memory, and the data node is responsible for storing and reading the data files. The HDFs component, the MapReduce component (Jobtracker-tasktracker-maptask,reducetask,word count application), and the execution process of MapReduce.
Hadoop Ecosystem:
Hadoop release: Cloudera cdh,hortonworks hdp,intel distribution,ibm biginsight. Solve the tedious dependencies and so on.
Hadoop version Selection: Hadoop 1.0, 2.0, where 1.0 contains 0.20.x,0.21.x, 0.22.x, where 0.20.x finally evolved to 1.0.x, the latter two added Namenode ha and other important features. The Hadoop2.0 version is 0.23.x,2.x, which differs from hadoop1.0, and is a new architecture that includes HDFs Federation and yarn Two systems, with 0.23.x,2.x HA added compared to Namenode, The characteristics of the wire-compatibility.
Hadoop open source software and ecosystem