Google's BigTable principle (translation)

Source: Internet
Author: User
Tags html page
Google's BigTable principle (translation)

Preface: Google's success in addition to a brilliant idea, but also because there is Jeff Dean, such as software architecture genius.

Welcome to subscribe to the author's Weibo------editor

There is an explanation for bigtable in the official Google Reader blog. This is a system developed internally by Google to handle large volumes of data. This system is suitable for the processing of semi-structured data such as RSS data sources. The following statement was made by Andrew Hitchcock on October 18, 2005 based on: Google's engineer Jeff Dean's conversation at the University of Washington (Creative Commons License).


First, BigTable has been developing since the beginning of 2004 and has been in use for nearly 8 months now. (February 2005) There are about 100 services using bigtable, such as: Print,search History,maps and Orkut. According to Google's usual practice, the bigtable developed in-house was designed for running on inexpensive PC machines. BigTable makes it possible for Google to reduce the cost of running new services and make the most of computing power. BigTable is built on GFS, Scheduler, Lock Service and MapReduce.

Each table is a multidimensional sparse map sparse map. Table consists of rows and columns, and each cell has a timestamp. Multiple copies of the cell in the same storage cell at different times, so that the data can be recorded in the event of a change. In his case, the line is the URLs, the column can define a name, for example: contents. The Contents field can store the file's data. or the column name is: "Language", you can store a "EN" language code string.

in order to manage the huge table, the table is divided according to the row, the segmented data is collectively referred to as: Tablets. each Tablets is about 100-200 MB, and each machine stores 100 or so Tablets. The underlying architecture is: GFS. Since GFS is a distributed file system, it is possible to get good load balancing after using the tablets mechanism. For example, you can move a frequently-responding table to another idle machine and then quickly rebuild it.

tablets in the system is stored in a non-modifiable immutable sstables, a machine a log file. when the system's memory is full, the system compresses some tablets. Since Jeff spoke very quickly on this point, I didn't have time to record what I heard, so here's an approximate explanation:

Compression is divided into two parts: primary and secondary. Secondary compression consists of only a few tablets, while the main compression is on the entire system of compression. Primary compression has the ability to reclaim hard disk space. the location of the tablets is actually stored in a few special bigtable cell cells. It looks like this is a three-tier system.

The client has a pointer to the Metao tablets. If Metao's tablets is used frequently, the machine will abandon the other tablets specifically support Metao this tablets. Metao tablets maintains a record of all META1 's tablets. These tablets contain the actual location of the lookup tablets. (To be honest with you, I don't quite understand the translation here.) There is no big bottleneck in this system, because the data that is frequently called is already obtained in advance and cached.

Now we return to the description of the column: The column is similar to the following form: Family:optional_qualifier. In his case, the line: Www.search-analysis.com might have columns: "Contents: The code that contains the HTML page. "Anchor:cnn.com/news" contains the corresponding URL, "anchor:www.search-analysis.com/" contains the text part of the link. The column contains the type information.

(translation here I want to insert a sentence, before I saw a universal database article, then very excited, contacted the author, now in retrospect, perhaps Google's bigtable is a better solution, not to mention the distribution of features, this is the construction of the table structure is very useful. )

Note that this is the column information, not the column type. The information in the column is the following information, which is generally: properties/rules. For example: Save n copies of data or save data n Tianchang and so on. When the tablets is re-established, use the above rules to get out of the non-qualifying records. For design reasons, the creation of the column itself is easy, but the functionality associated with the column is really complex, such as the type and rule information mentioned above. To optimize the reading speed, the function of the column is split and stored in a group of machines on the index being built. These split groups act on columns and are then split into different sstables. This approach can improve the performance of the system, because small, frequently read columns can be stored separately and isolated from large, infrequently accessed columns.

All tablets on a machine share a log, in a cluster containing 100 million of tablets, which will cause very many files to be opened and written. The new log block is often created, typically 64M in size, and the GFS's block size is equal. When a machine is down, the control machine re-publishes his log block to continue processing on other machines. The machine rebuilds the tablets and then asks to control where the machine handles the structure, and then directly processes the reconstructed data.

There is a lot of redundant data in this system, so the compression technology is used extensively in the system.

Dean said to the compressed part very quickly, I did not completely write down, so I would like to say: Before compressing the first look for similar rows, columns, and time data.

they use different versions of: Bmdiff and Zippy technology.

Bmdiff provides them with very fast write speeds: 100mb/s–1000mb/s. Zippy is similar to LZW. Zippy is not as compact as LZW or gzip, but he handles it very fast.

Dean also gave an example of compressing web spider data. The spider in this example contains a 2.1B page, which is named by the following: "Com.cnn.www/index.html:http". The Web page size before uncompressed is: 45.1 TB, the compressed size is: 4.2 TB, just the original 9.2%. Links data is compressed to the original 13.9%, and the link text data is compressed to the original 12.7%.

Google also has a lot of features that haven't been added but have been considered.

1. Data manipulation expressions, which can send scripts to the client to provide the ability to modify the data.
2. Multi-line data support for things.
3. Improve the efficiency of big data storage units.
4. BigTable operates as a service.
It seems that each service, such as maps and search history, has its own cluster running BigTable.
They also consider running a global BigTable system, but this requires relatively fair partitioning of resources and computing time. Original Address:
Http://blog.csdn.net/accesine960/archive/2006/02/09/595628.aspx
http://blog.outer-court.com/archive/2005-10-23-n61.html


Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.