Big data: Massive data
Structured data: Data that can be stored in a two-dimensional table
unstructured data: Data cannot be represented using two-dimensional logic of the data. such as word,ppt, picture  
Semi-structured data: a self-describing, structured and unstructured data that stores the structure with the data itself: XML, JSON, HTML
Goole paper: mapreduce:simplified Date processing on Large Clusters
                       < Span style= "FONT-SIZE:14PT;" >             < Span style= "FONT-SIZE:14PT;" >    Dynam  
Map: Small data that maps big data to multiple nodes that are segmented
Reduce: Folding
i1,i2 ==> o1,i3 ==>o2,i4==>o4 
MapReduce: Mapping Big Data to key-value pairs
  data collection, monitoring, analysis, processing
Hadoop:jobtracker, Tasktracker,namenode,datanode
Features of Hadoop:
(1 ) Outward expansion 
(2 ) data redundancy 
(3 ) Move the program to the data  
  (4) sequential processing of data to avoid random access
  (5) Hide system-level details from programmers
  (6) Smooth expansion
How to cut big data into several small data that can be processed, and how to combine the results of processing
How to select the host processing task that moves the task to multiple different small data
How to get the small data that is segmented
How to keep a map process in sync
Map how to transfer the results of processing to reduce
How to ensure the integrity of a task in the event of a software failure or hardware failure
Mapreduce:
1.   programming Framework: API
2. Running the platform
3. Concrete implementation
Hadoop:hdfs-->mapreduce (Api,java)
Hdfs:
HDFS Distributed cluster data storage
1) HDFS
650) this.width=650; "src=" http://s3.51cto.com/wyfs02/M01/72/2C/wKioL1XejcWRo5GHAAEWtcTF2T4568.jpg "title=" 1.jpg " alt= "Wkiol1xejcwro5ghaaewtctf2t4568.jpg"/>
2) Save data storage to HDFs sub-file system
650) this.width=650; "src=" http://s3.51cto.com/wyfs02/M02/72/30/wKiom1XeiqaDTlXGAAFlS8bp7pI231.jpg "title=" 1.jpg " alt= "Wkiom1xeiqadtlxgaafls8bp7pi231.jpg"/>
MapReduce cluster data processing large file
650) this.width=650; "src=" http://s3.51cto.com/wyfs02/M00/72/2C/wKioL1XejkmwiazJAAFArPmRY4g819.jpg "title=" 1.jpg " alt= "Wkiol1xejkmwiazjaafarpmry4g819.jpg"/>
HBase, run on HDFs by zookeeper coordination work
   Hadoop  DataBase
Zookeeper enables Hadoop to store a single small file for random storage
Nosql
                             Colum: Column Store  
                          Storage of loose data, column-based storage of key-value pairs  
                               Merging a single small file into a large file  
   bigtable:  Big Table    
Etl
Data Extraction , conversion, loading 
Log collection:
  flume     
                            
  chukwa     
This article is from the Linux tours blog, so be sure to keep this source http://openlinuxfly.blog.51cto.com/7120723/1688801
Hadoop----My understanding of Hadoop