Kneel to beg each road warrior:
1, the first is a columnstore database of simple data model, it is more than the key value of the model/document model NoSQL database complex point (also stronger).
2, its distributed storage performance relies on GFs also on the single-room network has a hard indicator.
3, it also provides a relatively balanced sequential read and write operation, it is more suitable for such applications.
4, to ensure that sstable inconvenience, the structure simplifies the read-write conflict caused by the complexity of the problem. Also allows different tablets to share a sstable.
5, memtable design to reduce a large number of read and write conflicts, dual-thread + sequence allows the merge write, in view of the smaller probability of the read recovery operation, major compaction in the sorting time with the write-time retained serial number to go heavy, simplifying the write operation.
6, high-level cache for access to the same data cache service is relatively easy to think of, but the block levels of caching to solve the sequential read-write efficiency, is worth learning.
7. Because the row key is sorted according to the dictionary order, the selection of the row key at the application level is a center of gravity for the design.
8, the design of the dictionary is a challenge to cross-line updates and distributed transactions, but this system does not apply to solve similar problems.
9, provides the infinite column structure as well as column families, is it is stronger than the key value pair model/document model NoSQL database place, is equivalent to the self-built various indexes. This is called semi-structured data.
10. The tablets server is less dependent on the primary server because the primary server communicates only with a limited number of tablets servers and is only responsible for resolving their survival problems without having to resolve direct requests from the user.
11, the use of chubby distributed locking mechanism, the use of file handle collision detection to achieve the management of the distributed server, and with the main server to tablets heartbeat detection, to achieve a complete detection, in addition to the use of suicide and homicide technology, so that the whole system has been high reliability.
12, two-stage compression for similar to different time-point data storage compression can do a good compression ratio, because the content of a high degree of repetition.
Http://research.google.com/archive/bigtable-osdi06.pdf
Google's BigTable study notes (not guaranteed correctness)