Facebook's photo sharing is very popular, so far, Facebook
The user has uploaded 15 billion photos, plus thumbnails, with a total capacity of more than 0.2 billion Pb. The number of new photos per week is 20 million, which is about 25 Tb. during peak hours, Facebook
Processing 0.55 million photos per second makes managing the data a huge challenge. This article, written by Facebook engineers, describes how they manage these photos.
Old NFS photo Architecture
The architecture of the old photo system is divided into the following layers:
- The upload layer receives pictures uploaded by users and stores them in the NFS storage layer.
- The photo service layer receives HTTP requests and outputs photos from the NFS storage layer.
- The NFS storage layer is built on a commercial storage system.
Because each photo is stored separately as a file, such a large volume of photos leads to a very large size of metadata, exceeding NFS
The upper limit of the cache on the storage layer. As a result, each recruitment request will upload multiple I/O operations. The huge metadata has become the bottleneck of the entire photo architecture. This is why Facebook relies heavily on
CDN reasons. To solve these problems, they made two optimizations:
- Cachr: A cache server that caches photos of small-size Facebook users.
- NFS file handle cache: deployed on the photo output layer to reduce the metadata overhead of the NFS storage layer.
New haystack photo Architecture
The new photo architecture combines the output layer and storage layer into a physical layer and is built on an HTTP-based photo server. The photo is stored in
To eliminate unnecessary metadata overhead during photo reading. In the new architecture, I/O operations only target real photo data (instead of File System metadata ). Haystack
It can be divided into the following functional layers:
- HTTP Server
- Photo storage
- Haystack Object Storage
- File System
- Storage space
Storage
Haystack is deployed on the commercial storage blade server. It is typically configured as a 2u server, including:
- Two 4-core CPUs
- 16 GB-32 GB memory
- Hardware raid, including 256-512 m nvram high-speed cache
- More than 12 1 tb sata hard drives
Each blade server provides a storage capacity of about 10 TB and uses hardware raid-6. Raid 6 achieves good performance and redundancy while maintaining low costs. Poor write performance can be solved through high-speed cache, and hard disk cache is disabled to prevent power loss.
File System
The haystack Object Library is built on a single file system with 10 TB capacity. Each file in the file system corresponds to a specific physical location in a block table. The current file system is XFS.
Haystack Object Library
Haystack is a simple log structure that stores pointers to internal data objects. A haystack includes two files, including pointers and index files:
Haystack Object Storage Structure
Pointer and index file structure
Haystack write operation
The haystack write operation synchronously appends the pointer to the haystack storage file. When the pointer accumulates to a certain extent, an index is generated and written to the index file. To reduce the loss caused by hardware faults, index files are regularly written into buckets.
Haystack read Operations
Parameters uploaded to the haystack read operation include the pointer offset, key, substitute key, cookie, and data size. Haystack then reads the entire pointer from the file based on the data size.
Haystack deletion operation
Deletion is simple, but a deleted mark is set on the pointer stored in the haystack. The space for deleted pointers and indexes is not recycled.
Photo storage server
The photo storage server is responsible for accepting HTTP requests and converting them to corresponding haystack operations. To reduce I/O operations, the server maintains all haystack
File index cache. When the server starts, the system will read these indexes into the cache. Since each node has millions of photos, you must ensure that the index capacity does not exceed the physical memory of the server.
For images uploaded by users, the system assigns a 64-bit independent ID, and the photos are scaled to four different sizes. Images of each size have the same random cookie and
ID, image size description (large, medium, small, thumbnail) is stored in the substitute key. The upload server then notifies the photo storage server to store the connected images in the haystack.
The index cache of each image contains the following data:
Haystack uses Google's open-source sparse hash data structure to ensure that the index cache in the memory is as small as possible.
Write/modify photo storage
The write operation writes the photo data to the haystack storage and updates the indexes in the memory. If the index already contains the same key, it indicates a modification operation.
Read operations on photo storage
Parameters passed to haystack include haystack ID, photo key, size, and cookie. The server searches for and reads real data from the cache to haystack.
Delete photo storage
After hybriddb for MySQL notifies hybriddb for MySQL to delete the image, the index cache in the memory is updated and the cost is set to 0, indicating that the image has been deleted.
Rebundling
Rebundling copies and creates a new haystack. During this process, the deleted photo data is skipped and the index cache in the memory is re-created.
HTTP Server
The HTTP framework uses a simple evhttp server. With multithreading, each thread can process an HTTP request separately.
Conclusion
Haystack is an HTTP-based object storage service that contains pointers to object data. This architecture eliminates the metadata overhead of the file system and stores all indexes directly to the cache, stores and reads photos with minimal I/O operations.
Http://www.facebook.com/FacebookEngineering#/note.php? Note_id = 76191543919 & ref = mf
Source: comsharp CMS official website