Author: Liu Xuhui Raymond reprinted. Please indicate the source
Email: colorant at 163.com
Blog: http://blog.csdn.net/colorant/
More paper Reading Note http://blog.csdn.net/colorant/article/details/8256145
Keywords
Bigtable GFS
Distributed Database
=
Target question =
High-performance, high-reliability, and scalable databases are used to serve the data storage requirements of Google products represented by search engines and Earth.
=
Core Idea =
The core data model of bigtable is a sparse multi-dimensional map data structure.
(Row-row,
Column-column,
Timestamp-timestamp) is an index. User data that is not transparent to bigtable is stored in each indexed unit.
Row: logically, bigtable data is stored in the unit of behavior and row
Keys are sorted by index, and the number of units that can be stored in one row is unlimited. They are physically stored in the order of row keys. Therefore, the design of user data tables should be fully utilized, try to put the relevant data in the same or similar row
Key.
Bigtable data is also divided into data segments (tablet) based on the row range. Each tablet can be managed by different tabletservers, so that data in different segments is served by different server nodes, data throughput can be guaranteed
Key design to distribute hotspot data and balance loads)
Column: bigtable columns are grouped by family and are displayed in the format of family: qualifier. The purpose of grouping is to control permissions in the unit of family. More importantly, you can store different types of data physically in different groups, to achieve better compression performance. Considering the data sparsity and uncertain schema, the number of specific columns is unlimited and does not need to be stated in advance. But which families must be pre-determined?
Timestamp: timestamp is used to save data of different cell versions. You can use the system timestamp or any series of data specified by the user to identify the version number, which is based on big
The external store on table uses timestamp to implement Acid Based on MVCC.
=
Implementation =
Bigtable supports row-based transaction. The bottom layer relies on chubby to support various synchronization and collaborative operations, such as the selection of the master node and the accesscontrol of the retriable node.
GFS is used at the underlying layer to store the content of a tablet file. Because of the read-only format of the file content, any update operation is implemented by creating a new row. In order to speed up the response and reduce Io operations, when writing data, first write the data into the memory's memtable structure (sorted by rowkey). When the memtable size exceeds a certain threshold, write this memtable into the file system as a new file. To prevent data loss in the memory during the fault, a commit is maintained on the disk.
Log, the Operation Records of the data are first written into the log before being inserted into the memtable. In addition, because data of different columns or versions of the same row may be distributed in different physical files and memtables, You need to scan these locations to submit the Merged Results at the same time. To reduce the cost of related operations, bigtable regularly merges the contents of these files and writes them to new files (and also deletes some logically deleted data)
Bigtable supports File compression. You can specify different compression schemes based on family.
Bigtable supports familylocality in memory.
Group, the related data will not be replaced after it is loaded into the memory, which is used to accelerate access to specific hotspot data.
Supports bloomfilter to quickly identify the existence or absence of specific data and accelerate data retrieval.
=
Related Research, project, etc. =
Hbase is basically the open-source implementation of bigtable on hadoop/HDFS, and most ideas are consistent.
External Store: a cross-region data storage solution based on bigtable