Key-File Storage System Weedfs

Source: Internet
Author: User
Tags posix
This is a creation in Article, where the information may have evolved or changed.

2012-12-31

Key-File Storage System Weedfs

Weedfs is a key-file storage system implemented with the go language, according to a Facebook paper.
In this paper, Facebook faces a huge amount of photo storage, the data feature is once written, read frequently, without modification, rarely deleted. Analysis based on POSIX system The main problem in this scenario is that the meta-information is stored on disk, and the read meta-information disk IO becomes a performance bottleneck-the first (possibly multiple) reads the file name into the I node, the second reads the I node, and the third reads the data.

Design goal:

    • High-throughput low-latency meta-information is all stored in memory, avoiding multiple disk IO
    • Fault tolerant
    • Simple

Facebook's original design

The browser request is redirected to CDN,CDN if the image is cached and returned directly, otherwise the photo storage server is queried. Photo Storage server is done with NFS, they changed the kernel to do a file descriptor cache Open_by_filehandle, with memcache cache open File descriptor, avoid multiple read disk

The problem is that the caching effect is not good: there is a "long tail effect" in the frequency of the picture in the CDN, the cache hits only a part, the long tail consumes the large bandwidth photo storage server cache, the effect is not obvious. Even if there is a cache, you cannot change the nature of a POSIX read operation that requires multiple disk operations

Multiple photos stored in a large file, reducing the number of files
Reducing metadata information, putting metadata all in memory, like permissions, is unnecessary for the application scenario

According to volume, the physical volume on different machines is divided into logical volume,directory to maintain logic to physical mapping;
The cache function is the same as the original CDN, mainly caching and single point of failure, but the internal system;
Generate URLs such as Http://<CDN>/<Cache>/<Machine id>/<logical volume, photo>

Directory role:

    • Logical physical mapping of volume
    • Read/write load balancing for volume
    • Mark as read-only when volume full

Cache is a distributed hash table implementation, the photo ID is key
Only requests from the user are cached, not from the CDN, because the CDN is miss and the internal cache hit is unlikely.
Only the readable volume is cached, because in the scenario, photo is generally accessed more when it is first passed

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.