Google's three core technologies (i) Google File System-Introduction

Last Update:2018-07-26 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

To meet Google's rapidly growing data processing needs, we have designed and implemented the Google File system (Google System–gfs). GFS has many of the same design goals as traditional distributed file systems, such as performance, scalability, reliability, and availability. However, our design is also based on our observation of the load and the technical environment of our own applications, both now and in the future, and the assumptions of GFS and earlier file systems are significantly different. So we re-examined the traditional file system in the design of the compromise choice, derived from a completely different design ideas. First, component invalidation is considered a normal event, not an incident. GFS includes hundreds of or even thousands of of ordinary inexpensive equipment assembled by the storage machine, while being accessed by a considerable number of clients. The number and quality of GFS components results in the fact that some components may not work at any given time, and some components cannot recover from their current failure state. We have encountered a variety of problems, such as application bugs, operating system bugs, human errors, and even hard drives, memory, connectors, networks, and power failures. Therefore, the mechanisms for continuous monitoring, error detection, disaster redundancy, and automatic recovery must be integrated in GFS. Second, our files are very large, measured by the usual standards. A few gigabytes of files are very common. Each file typically contains many application objects, such as Web documents. When we often need to deal with fast-growing, terabytes of data sets made up of hundreds of millions of of objects, it is very unwise to adopt a small file that manages hundreds of millions of KB sizes, although some file systems support this way of managing. Therefore, the assumptions and parameters of the design, such as I/O operations and the size of the block, need to be reconsidered. Thirdly, most of the files are modified by appending data at the end of the file, rather than overwriting the original data. Random writes to a file are virtually nonexistent in practice. Once written, the file is read-only and is usually read sequentially. Large amounts of data meet these characteristics, such as: Data analysis program scan of the very large data sets, running applications generated by the continuous flow of data, archived data, one machine generated, another machine processing intermediate data, the processing of these intermediate data may be at the same time or may be a follow-up process. For this access pattern for massive files, the client is meaningless to the block cache, and the data append operation is the main consideration for performance optimization and atomicity assurance. Finally, the collaborative design of the application and file system APIs improves the flexibility of the entire system. For example, we relaxed the requirements for the GFS conformance model, which reduced the critical requirements of the file system to the application, greatly simplifying the design of GFS. We have introduced atomic record append operations to ensure multipleThe client is able to perform the simultaneous append operation without the need for additional synchronization operations to ensure data consistency. There is also a detailed discussion of the details of these issues later in this article. Google has deployed multiple GFS clusters for different applications. The largest cluster has more than 1000 storage nodes, more than 300TB of hard disk space, and is continuously accessed by hundreds of clients on different machines.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Google's three core technologies (i) Google File System-Introduction

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Google's three core technologies (i) Google File System-Introduction

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support