GFS MapReduce bigtable Relationship _gfs

Source: Internet
Author: User
Tags web services

GFS (published in 2003) uses a commercial hardware cluster to store massive amounts of data. The file system replicates data between nodes redundantly. MapReduce (2004) is a complement to the GFS architecture because it leverages the large number of CPUs provided by all the Low-cost servers in the GFS cluster. Together with GFS, it forms the core force for handling massive amounts of data, including building Google's search index. But both systems lack the ability to access data in real time, meaning they are not enough to handle Web services.


Another drawback of GFS is that it is suitable for storing a few very, very large files, rather than storing hundreds of tens of thousands of of small files, such as pictures on social platforms, because files without data are ultimately stored in the master's memory, and the more the file master is, the greater the pressure.


There is a need for a solution that can drive an interactive application, and can take advantage of both of these infrastructures and rely on the data redundancy and data availability features of GFS storage. The stored data should be split into particularly small entries, then aggregated by the system into very large files, and provide some sort of index that allows the user to find the fewest disks to get the data. Ultimately, it will be able to store the results of the crawler in time and generate search indices in collaboration with MapReduce. Then consider abandoning the characteristics of the relationship, using a simple API for the increase and deletion of the operation, plus a scan function to the larger key range or the entire table iterative scanning, and eventually form a management of structured data distributed Storage System BigTable (2006).


It is worth mentioning the cap theorem, which states that a distributed system can only achieve two of consistency, availability, and partitioning tolerance (independence) at the same time, not three. The need to loosen consistency increases the availability of the system.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.