Abandon dram, embrace flash memory, Facebook redo memcached
Posted on | 2929 views | source Facebook | 9 comments | author Facebook
Facebookmemcachedmcipper flash
Abstract: The social media giant Facebook's "Cold Storage" strategy has taken another big step forward, and the guiding principles for moving forward are still flash memory. Facebook's "old minister" memcached is the "knife", but this time it is no longer a few times the speed of improvement, but it is directly refined into a flash version. Well, let's take a look at Facebook's new key-value cache system, mcipper.
Last time, Facebook, a social media giant, had difficulties because it could not burn "Cold data into a cd" as a general company or organization ", I had to ask some hardware vendors for cheap flash memory to store the infrequently accessed data. Today, Facebook has rewritten memcached to further reduce costs, so that it can match Facebook's "Cold Storage" strategy. The new distributed cache system is named "mcdipper", which is more expensive than dram, the new system is based on low-cost flash memory. So while learning about mcdipper, we must first understand memcached's "non-human experience" on Facebook ":
Memcached, a high-performance distributed memory object cache system, is used for dynamic web applications to reduce server load. By caching data and objects in the memory, you can reduce the number of data queries to reduce the load and increase the throughput. Memcached uses a hash chart to store key values. Memcached was developed by danga interactive and used by many companies after being open-source. However, memcached is obviously "Fierce" in Facebook's hands: It processes 0.2 million UDPS of requests per second, and the average latency is only 173 subtle (although the total throughput once reached 0.5 million UDP per second, however, because the latency is too high to be used), compared with the previous 50 thousand UDP per second, it is undoubtedly a crazy increase.
However, Facebook did not stop using memcached because of its "miserable experience". In March 5, Facebook announced on its corporate log that it had replayed it as a flash-based mcdipper.
The following is a translation:
Modify the reason and motivation
Facebook has been using memcached as its MySQL buffer, but DRAM is too expensive. To keep memcached efficient, Facebook has to convert it into a flash version of mcdipper, flash memory also has the following advantages:
20 times memory capacity
Although there is no memory efficiency, it still supports tens of thousands of operations per second
To cater to Facebook's low-cost "Cold Storage" strategy, memcached's "reincarnation" is mcipper.
Mcdipper is compatible with the memcache protocol. It was designed to make better use of flash memory and then replace memcached. It has been in use for nearly a year.
Mcdipper storage layer
Use checksum to verify each operation to protect all data structures in the mcdipper. The cache replacement policy is configurable. You can select either FIFO (first-in-first-out) or LRU (least frequently used. Depending on your load, you can use the bloom filter (Bloom filter) to avoid unnecessary read operations; compress the exchange computation to improve the acquisition capability, use encryption to prevent drive loss or theft, and optimize storage utilization through different levels of restrictions.
Redo memcached
Unfortunately, implementing get, set, and delete operations in memcached is not as simple as you think. When many connections are connected to the memcache pool for session, many problems will occur, most of which are repeated problems caused by persistence and performance. Competition between values is a key issue.
In short, when you write new values to the backup storage, you need a method that invalidates the old values in memcached. One of the ways is to set a new value for memcached. However, this method will become invalid when secondary modification is made quickly, because the newly added value may arrive at the memcached instance first. For this reason, Facebook uses the delete invalidation method and waits for reader to re-inject as needed. Of course, there is still competition in this method (although the probability is very small), as shown below:
The old setting is better than the one completed after deletion, which will still lead to wrong values
To solve this problem, Facebook uses a delete delay that includes special delete operations that have a certain wait time. Memcached prevents any new values from being written into this key until the deletion job ends. This means that the old settings can be put forward only a few seconds after the corresponding deletion job is completed, so that the overall cache consistency will be guaranteed.
Use the delete delay to block expiration settings
To achieve deletion latency, you must first confirm that there is no latency before writing the value. If you simply use a query and then set it up, you will be in a dilemma when the following situation occurs: the latency is based on whether you query whether it exists, and before you set the old value. To eliminate this race condition, Facebook implements another method of read-Modify-write. This method uses a functional predicate (function pointer) to map the old value to the new value. Through this primitive, Facebook also established the remaining memcache interfaces, which provide many atomic operations (increment, append, and so on ).
Based on its development, Facebook has deployed several large encapsulation and low-speed request pools for mcipper, greatly reducing the number of servers in some pools by about 90%, however, the 90% request response latency remains at the second level.
Apply mcdipper to Facebook image infrastructure
The most important application of mcdipper in Facebook over the past year is to serve Facebook's image infrastructure. When an end user accesses Facebook CDN (content publishing network) and haystack (small file storage software), it usually uses two mcdipper layers. HTTP and HTTPS requests to Facebook's world-oriented HTTP Server Load balancer will be converted into memcache requests and then forwarded to mcdipper. When these requests are lost, they will be forwarded to Facebook as an image source Server Load balancer, which also has a similar configuration of the mcipper instance. If the source cache is also lost, the request will be forwarded to the haystack system.
To optimize image use cases, Facebook configures a small volume for the bucket running by mcipper and connects these buckets to the memory. All images are stored in the flash memory, and the bucket-related hash metadata is stored in the memory. The use of the memcache protocol allows Facebook to quickly use some existing software (nginx and srcache) through this method, while also using some of Facebook's existing stacks, allows the Server Load balancer to directly transfer the connection to the HTTP Server Load balancer.
Facebook CDN can process more than 150 GB cache forwarding per second (more accurate, it should be 10 TB per minute), but it only uses a small server cluster.
About open source:
After reading the introduction of mcdipper, I believe many people are concerned about whether Facebook will open source mcipper, just like the author. Some netizens also asked this question on their official pages. Unfortunately, they did not receive a positive response from Facebook. However, according to Facebook's past open source history, the open source of mcdipper can still be expected.
Original article: mcdipper: a key-value cache for flash storage (compilation/ZhongHao review/Wang Xudong)
Welcome to @ csdn cloud computing Weibo for discussion to learn more about cloud computing.
This article is compiled for csdn and cannot be reproduced without permission. If you need to reprint please contact market@csdn.net