How Facebook stores billions of of photos

Source: Internet
Author: User

Sharing photos is one of the most popular features on Facebook. So far, users have uploaded more than 1.5 billion photos, making Facebook the largest photo-sharing site. For each uploaded photo, Facebook generates and stores four images of different sizes, converting to a total of 6 billion photos with a total capacity of over 1.5PB. The current growth rate of 2.2 million new photos per week is equivalent to an additional 25TB of storage per week. 550,000 photos are required to transmit at peak times per second. These figures are a major challenge to Facebook's photo storage infrastructure.

Old NFS Photo Architecture

The old photo system architecture is divided into the following layers:

The upload layer receives photos uploaded by the user and is saved at the NFS storage layer.

The photo service layer receives HTTP requests and prints photos from the NFS storage layer.

NFS storage tiers are built on commercial storage systems.

Because each photo is stored separately as a file, such a large amount of photos results in a very large scale of metadata, exceeding the cache limit of the NFS storage layer, resulting in multiple I/O operations per request upload. The huge metadata becomes the bottleneck of the entire photo architecture. That's why Facebook relies heavily on CDN. To solve these problems, they have done two optimizations:

Because each photo is stored separately as a file, a large number of directories and files generate a large amount of metadata on the NFS storage layer, which is much larger than the cache limit for NFS storage tiers, resulting in multiple I/O operations being uploaded for each recruitment request. The huge metadata becomes the bottleneck of the entire photo architecture. That's why Facebook relies heavily on CDN. To solve these problems, they have done two optimizations:

CACHR: A caching server that caches Facebook's small user profile photos.

NFS file handle caching: Deployed at the photo output layer to reduce the metadata overhead of the NFS storage layer.

New Haystack Photo Architecture

The new photo architecture merges the output layer and storage layer into a single physical layer, built on an HTTP based photo server, where photos are stored in an object library called haystack to eliminate unnecessary metadata overhead in photo-reading operations. In the new schema, I/O operations are only for real photo data (not file system metadata). Haystack can be subdivided into the following functional layers:

HTTP Server

Photo Storage

Haystack Object Storage

File system

Storage space

In the following introduction, we will describe each of these functional layers in detail.

Storage space

Haystack is deployed on a commercial storage blade server, typically configured as a 2U server, containing:

Two 4 core CPUs

16GB–32GB Memory

Hardware RAID, including 256-512m NVRAM cache

More than 12 1TB SATA hard Drives

Each blade server provides approximately 10TB of storage capacity, using hardware RAID-6, and RAID 6 delivers good performance and redundancy on a low cost basis. Poor write performance can be solved through RAID controller and NVRAM cache writeback, written because most of the reading is random, the NVRAM cache is fully used for writing.

File system

The Haystack object library is built on a single file system of 10TB capacity.

Picture-read requests need to be offset at the location where the files are called by the read system, but in order to perform the read operation, the file system must first find the data on the actual physical volume. Each file in the file system is identified by one called the inode structure. The inode contains a mapping of logical file offsets and physical block offsets on the disk. Large File block mappings can be quite large when using a particular type of file system.

A filesystem-based chunk saves mappings for logical chunks and large files. This information is usually not suitable for storing in the Inode's cache, but is stored in an indirect address block. Therefore, you must follow a specific process when reading a file. There can be multiple indirect address blocks, so a read will produce multiple I/O depending on whether the indirect address block is cached.

The system only maintains mappings for contiguous ranges of blocks. A block map of a contiguous large file can be identified by only one range, which is adapted to the system requirements of the inode. However, if the file is a cut of discontinuous blocks, his block map may be very large. The above can reduce fragmentation by proactively allocating chunks of space to large physical files through the file system.

The file system currently in use is XFS, which provides a large degree of efficient file-pre-allocation systems.

Haystack Object Storage

Haystack is a simple log structure (append only) that stores pointers to its internal data objects. A Haystack consists of two files, including pointers and indexes. The following picture describes the layout of the haystack storage file:

The haystack 8K storage is occupied by the Super block. The super block is followed by a pin, each stitch consisting of a head, data and tail:

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.