How Facebook stores billions of photos

Source: Internet
Author: User

Sharing photos is one of the most popular features on Facebook. Up to now, users have uploaded more than 1.5 billion photos, making Facebook the largest photo sharing website. For each uploaded photo, Facebook generates and stores four images of different sizes, which are converted to a total of 6 billion photos, with a total capacity of more than Pb. Currently, it is growing at a rate of 2.2 million new photos per week, equivalent to an additional 25 Tb of storage per week. You need to transfer 0.55 million photos per second during peak hours. These numbers pose a major challenge to Facebook's photo storage infrastructure.

Old NFS photo Architecture

The architecture of the old photo system is divided into the following layers:

The upload layer receives pictures uploaded by users and stores them in the NFS storage layer.

The photo service layer receives HTTP requests and outputs photos from the NFS storage layer.

The NFS storage layer is built on a commercial storage system.

Because each photo is stored separately in the form of a file, such a large volume of photos leads to a very large size of metadata, which exceeds the cache limit of the NFS storage layer, as a result, each Request Upload contains multiple I/O operations. The huge metadata has become the bottleneck of the entire photo architecture. This is why Facebook depends primarily on CDN. To solve these problems, they made two optimizations:

Because each photo is stored separately as a file, a large number of directories and files generate a large amount of metadata on the NFS storage layer, this size of metadata far exceeds the upper limit of the NFS storage layer cache. As a result, each recruitment request will upload multiple I/O operations. The huge metadata has become the bottleneck of the entire photo architecture. This is why Facebook depends primarily on CDN. To solve these problems, they made two optimizations:

Cachr: A cache server that caches photos of small-size Facebook users.

NFS file handle cache: deployed on the photo output layer to reduce the metadata overhead of the NFS storage layer.

New Haystack photo Architecture

The new photo architecture combines the output layer and storage layer into a physical layer and is built on an HTTP-based photo server. The photos are stored in an object library called haystack, to eliminate unnecessary metadata overhead during photo reading. In the new architecture, I/O operations only target real photo data (instead of File System metadata ). Haystack can be divided into the following functional layers:

HTTP Server

Photo storage

Haystack Object Storage

File System

Storage space

In the following introduction, we will detail each of the above functional layers.

Storage space

Haystack is deployed on the commercial storage blade server. It is typically configured as a 2U server, including:

Two 4-core CPUs

16 GB-32 GB memory

Hardware RAID, including 256-512 m nvram high-speed cache

More than 12 1 tb sata hard drives

Each blade server provides a storage capacity of about 10 TB and uses hardware RAID-6. RAID 6 achieves good performance and redundancy while maintaining low costs. Poor write performance can be solved through RAID Controller and NVRAM cache write-back. Because most of the writes are random, NVRAM cache is completely used for writing.

File System

The Haystack Object Library is built on a single file system with 10 TB capacity.

Image reading requests must be offset when the reading system calls these files. To perform the read operation, the file system must first find the data on the actual physical volume. Each file in the file system is identified by an inode structure. Inode contains a ing between the logical file offset and the physical block offset on the disk. When using a special file system, the file block ing may be quite large.

A file system-based block maps a logical block to a large file storage. This information is usually not stored in inode caches, but in indirect address blocks. Therefore, when reading files, you must follow the specific process. Here there can be multiple indirect address blocks, so one read will generate multiple I/O depending on whether the indirect address block is cached.

The system only maintains ing for blocks in a continuous range. The block ing of a large continuous file can only be identified by a range, so as to meet the requirements of the inode system. However, if the file is a cut discontinuous block, its block map may be very large. You can use the file system to allocate large volumes of space for large physical files to reduce fragments.

The current file system is XFS, which provides a highly efficient file pre-distribution system.

Haystack Object Storage

Haystack is a simple log structure (append only) that stores pointers to internal data objects. A Haystack includes two files, including pointers and indexes. The following figure describes the layout of the haystack storage file:

The top 8 K storage of haystack is occupied by Super blocks. Followed by a super block is a needle, each containing a head, data, and tail:

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.