How Facebook stores billions of photos

Last Update:2013-10-28 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Sharing photos is one of the most popular features on Facebook. Up to now, users have uploaded more than 1.5 billion photos, making Facebook the largest photo sharing website. For each uploaded photo, Facebook generates and stores four images of different sizes, which are converted to a total of 6 billion photos, with a total capacity of more than Pb. Currently, it is growing at a rate of 2.2 million new photos per week, equivalent to an additional 25 Tb of storage per week. You need to transfer 0.55 million photos per second during peak hours. These numbers pose a major challenge to Facebook's photo storage infrastructure.

Old NFS photo Architecture

The architecture of the old photo system is divided into the following layers:

The upload layer receives pictures uploaded by users and stores them in the NFS storage layer.

The photo service layer receives HTTP requests and outputs photos from the NFS storage layer.

The NFS storage layer is built on a commercial storage system.

Because each photo is stored separately in the form of a file, such a large volume of photos leads to a very large size of metadata, which exceeds the cache limit of the NFS storage layer, as a result, each Request Upload contains multiple I/O operations. The huge metadata has become the bottleneck of the entire photo architecture. This is why Facebook depends primarily on CDN. To solve these problems, they made two optimizations:

Because each photo is stored separately as a file, a large number of directories and files generate a large amount of metadata on the NFS storage layer, this size of metadata far exceeds the upper limit of the NFS storage layer cache. As a result, each recruitment request will upload multiple I/O operations. The huge metadata has become the bottleneck of the entire photo architecture. This is why Facebook depends primarily on CDN. To solve these problems, they made two optimizations:

Cachr: A cache server that caches photos of small-size Facebook users.

NFS file handle cache: deployed on the photo output layer to reduce the metadata overhead of the NFS storage layer.

New Haystack photo Architecture

The new photo architecture combines the output layer and storage layer into a physical layer and is built on an HTTP-based photo server. The photos are stored in an object library called haystack, to eliminate unnecessary metadata overhead during photo reading. In the new architecture, I/O operations only target real photo data (instead of File System metadata ). Haystack can be divided into the following functional layers:

HTTP Server

Photo storage

Haystack Object Storage

File System

Storage space

In the following introduction, we will detail each of the above functional layers.

Storage space

Haystack is deployed on the commercial storage blade server. It is typically configured as a 2U server, including:

Two 4-core CPUs

16 GB-32 GB memory

Hardware RAID, including 256-512 m nvram high-speed cache

More than 12 1 tb sata hard drives

Each blade server provides a storage capacity of about 10 TB and uses hardware RAID-6. RAID 6 achieves good performance and redundancy while maintaining low costs. Poor write performance can be solved through RAID Controller and NVRAM cache write-back. Because most of the writes are random, NVRAM cache is completely used for writing.

File System

The Haystack Object Library is built on a single file system with 10 TB capacity.

Image reading requests must be offset when the reading system calls these files. To perform the read operation, the file system must first find the data on the actual physical volume. Each file in the file system is identified by an inode structure. Inode contains a ing between the logical file offset and the physical block offset on the disk. When using a special file system, the file block ing may be quite large.

A file system-based block maps a logical block to a large file storage. This information is usually not stored in inode caches, but in indirect address blocks. Therefore, when reading files, you must follow the specific process. Here there can be multiple indirect address blocks, so one read will generate multiple I/O depending on whether the indirect address block is cached.

The system only maintains ing for blocks in a continuous range. The block ing of a large continuous file can only be identified by a range, so as to meet the requirements of the inode system. However, if the file is a cut discontinuous block, its block map may be very large. You can use the file system to allocate large volumes of space for large physical files to reduce fragments.

The current file system is XFS, which provides a highly efficient file pre-distribution system.

Haystack Object Storage

Haystack is a simple log structure (append only) that stores pointers to internal data objects. A Haystack includes two files, including pointers and indexes. The following figure describes the layout of the haystack storage file:

The top 8 K storage of haystack is occupied by Super blocks. Followed by a super block is a needle, each containing a head, data, and tail:

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

How Facebook stores billions of photos

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

How Facebook stores billions of photos

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support