Big data: Massive data
Structured data: Data that can be stored in a two-dimensional table
unstructured data: Data cannot be represented using two-dimensional logic of the data. such as word,ppt, picture
Semi-structured data: a self-describing, structured and unstructured data that stores the structure with the data itself: XML, JSON, HTML
Goole paper: mapreduce:simplified Date processing on Large Clusters
< Span style= "FONT-SIZE:14PT;" > < Span style= "FONT-SIZE:14PT;" > Dynam
Map: Small data that maps big data to multiple nodes that are segmented
Reduce: Folding
i1,i2 ==> o1,i3 ==>o2,i4==>o4
MapReduce: Mapping Big Data to key-value pairs
data collection, monitoring, analysis, processing
Hadoop:jobtracker, Tasktracker,namenode,datanode
Features of Hadoop:
(1 ) Outward expansion
(2 ) data redundancy
(3 ) Move the program to the data
(4) sequential processing of data to avoid random access
(5) Hide system-level details from programmers
(6) Smooth expansion
How to cut big data into several small data that can be processed, and how to combine the results of processing
How to select the host processing task that moves the task to multiple different small data
How to get the small data that is segmented
How to keep a map process in sync
Map how to transfer the results of processing to reduce
How to ensure the integrity of a task in the event of a software failure or hardware failure
Mapreduce:
1. programming Framework: API
2. Running the platform
3. Concrete implementation
Hadoop:hdfs-->mapreduce (Api,java)
Hdfs:
HDFS Distributed cluster data storage
1) HDFS
650) this.width=650; "src=" http://s3.51cto.com/wyfs02/M01/72/2C/wKioL1XejcWRo5GHAAEWtcTF2T4568.jpg "title=" 1.jpg " alt= "Wkiol1xejcwro5ghaaewtctf2t4568.jpg"/>
2) Save data storage to HDFs sub-file system
650) this.width=650; "src=" http://s3.51cto.com/wyfs02/M02/72/30/wKiom1XeiqaDTlXGAAFlS8bp7pI231.jpg "title=" 1.jpg " alt= "Wkiom1xeiqadtlxgaafls8bp7pi231.jpg"/>
MapReduce cluster data processing large file
650) this.width=650; "src=" http://s3.51cto.com/wyfs02/M00/72/2C/wKioL1XejkmwiazJAAFArPmRY4g819.jpg "title=" 1.jpg " alt= "Wkiol1xejkmwiazjaafarpmry4g819.jpg"/>
HBase, run on HDFs by zookeeper coordination work
Hadoop DataBase
Zookeeper enables Hadoop to store a single small file for random storage
Nosql
Colum: Column Store
Storage of loose data, column-based storage of key-value pairs
Merging a single small file into a large file
bigtable: Big Table
Etl
Data Extraction , conversion, loading
Log collection:
flume
chukwa
This article is from the Linux tours blog, so be sure to keep this source http://openlinuxfly.blog.51cto.com/7120723/1688801
Hadoop----My understanding of Hadoop