Application scenarios and implementation principles of Memcache

Last Update:2015-05-06 Source: Internet

Author: User

Tags greatest common divisor memcached

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

The problems facing

Database access bottlenecks have always been a headache for high-concurrency, high-access Web applications. Especially when your program architecture is built in single-database mode, and the peak number of connections to a data pool has reached 500, your program is not far from the edge of the crash. Many small web site developers began to focus on the product requirements design, lack of neglect of the overall program performance, scalability and other aspects of the consideration, the results of watching the Internet crawl day after week, can suddenly find the site because the traffic is too large and collapsed, then cry too late. So we must prepare for a rainy day, in the database before the strike, to find ways to lighten it, this is the main topic of this article.

As we all know, when a request comes in, the Web server gives the app server, the app processes and accesses the relevant data from the DB, but the cost of DB access is quite high. In particular, each time to take the same data, is equal to make the database every time to do a high cost of work, the database if you can speak, will grumble, you have asked so many times, can not remember it? Yes, if the app gets the first data into memory and reads it directly from memory the next time it reads it, instead of having to bother the database, does it offload the database? and accessing data from within must be many times faster than from the database media, but it improves the performance of the application.

Therefore, we can add a layer of cache layer between the Web/app layer and the DB layer, the main purpose: 1. reduce database read burden; 2. Improves data read speed. In addition, the cache access media is memory, and the memory capacity of a server is generally limited, unlike hard disk capacity can be achieved TB level. Therefore, consider the use of a distributed cache layer, which makes it easier to break down the limits of memory capacity while increasing flexibility.

Memcached Introduction

Memcached is an open source distributed cache system, and now many large Web applications, including Facebook,youtube,wikipedia,yahoo, are using Memcached to support their hundreds of millions of-level page access every day. By integrating the cache layer with their web schemas, their applications improve performance while significantly reducing the load on the database. Specific memcached information you can get directly from its official website [1]. Here I will simply give you a brief introduction of how memcached works:

Memcached processes atoms are each (key,value) pair (hereinafter referred to as KV pair), key will be converted to Hash-key by a hash algorithm, easy to find, compare and do as much as possible hash. At the same time, Memcached uses a two-level hash, which is maintained by a large hash table.

Memcached has two core components: the service side (MS) and the client (MC), in a Memcached query, the MC first calculates the key's hash value to determine the MS position of the kv pair. When MS is determined, the client sends a query request to the corresponding MS, allowing it to find the exact data. Because there is no interaction and multicast protocol, the impact of memcached interaction on the network is minimized.

For example: Consider the following scenario, with three MC X, Y, Z, and three Ms A,b,c, respectively:

Set KV to x want to set key= "foo", value= "Seattle" x to get the MS List, and the key to do a hash conversion, according to the hash value to determine the location of the KV to the MS position B is selected X connection B,b received the request, put (key= "foo", value= " Seattle ") saved up

Get KV pair Z want to key= "foo" value z with the same hash algorithm to calculate the hash value, and determine the value of key= "foo" exists B on the z connection B, and from B to get value= "Seattle" Any other from X, Y, z to want Key= " Foo "will send a request for a value of B

memcached Server (MS)

Memory allocation

By default, MS is allocated memory with a built-in component called the "block allocator". Discard the C + + standard Malloc/free memory allocations, while the main purpose of the block allocator is to avoid memory fragmentation, otherwise the operating system takes more time to find these logically contiguous blocks of memory (actually disconnected). Using a block allocator, MS will take turns allocating large chunks of memory and reusing them continuously. Of course, because of the size of the blocks are different, when the size of the data and block size does not match the case, it is possible to cause memory waste.

At the same time, MS to key and data have the corresponding restrictions, the length of the key can not exceed 250 bytes, data can not exceed the block size limit---1MB. Because the hash algorithm used by MC does not take into account the memory size of each Ms. In theory, the MC assigns the probability of the equivalent kv pair to each MS, so that if each MS memory is not the same, that could lead to a decrease in memory utilization. So an alternative solution would be to find their greatest common divisor based on the memory size of each MS, then open n capacity = Greatest common divisor instance on each MS, which would be equivalent to having multiple sub-MS with the same capacity, providing overall memory utilization.

Caching policies

When the MS Hash table is full, the new insert data replaces the old data, and the updated strategy is the LRU (least recently used) and the effective time limit for each kv pair. The KV-to-store effective time limit is set in the MC driven by app and passed as a parameter to Ms.

While Ms Adoption is a lazy alternative, MS does not open an additional process to monitor the outdated kv pairs and delete them in real time, but only if and when the new data is inserted, and there is no extra space left to remove the action.

Cache Database Queries now the most popular way to use memcached is to cache database queries, and here's a simple example to illustrate:

The app needs to get userid=xxx user information, and the corresponding query statement is similar:

"SELECT * from users WHERE userid = xxx"

The app asks the cache first, there is no "user:userid" (key definition can be predefined constraints) of the data, if any, return data, if not, the app will read the data from the database, and call the cache's add function, the data into the cache.

When the data needs to be updated, the app calls the cache's update function to keep the database synchronized with the cache's data.

From the above example, we can also find that once the data of the database is found to change, we must update the data in the cache in time to ensure that the app reads the correct data in sync. Of course, we can record the expiration time of the data in the cache by the timer, and the time will trigger the event to update the cache, but there is always a time delay, which may cause the app to read the dirty data from the cache, which is also known as the dog hole problem. (I will specifically describe the problem later)

Data redundancy and fault prevention

From the design point of view, memcached is no data redundancy link, it is a large-scale high-performance cache layer, the addition of data redundancy can bring only the complexity of the design and increase the cost of the system.

When data is lost on an MS, the app can still get data from the database. However, it is more prudent to provide additional MS to support the cache when some MS does not work properly, so that it does not cause the database to get too much load from the cache without data being taken from it.

In order to reduce the impact of a MS failure, you can use the "Hot backup" scheme, is to use a new MS to replace the problem of MS, of course, the new MS will still use the original MS IP address, big deal data reloaded again.

Another way is to increase the number of nodes in your MS, and then MC will detect the state of each node in real time, if a node is found to be unresponsive for a long time, it will be removed from the available server list in the MC, and the server node is re-hashed. Of course, the problem is that the original key is stored on B and stored on C. So the solution itself has its weaknesses, preferably combined with a "hot backup" solution, which minimizes the impact of the failure.

Memcached Client (MC)

Memcached clients are available in a variety of languages, including java,c,php,.net and more, see Memcached API page[2]. You can choose the right client to integrate according to the needs of your project.

cached Web application architecture with caching support, we can add the cache layer between the traditional app layer and the DB layer, each app server can bind a MC, each time the data can be read from MS, if not, then read from the DB layer. And when the data to be updated, in addition to send the update of SQL to the DB layer, but also to the updated data to the MC, let MC to update the data in Ms.

Assuming that our database can communicate with MS in the future, the updated tasks can be delivered uniformly to the DB layer, and each time the database updates the data, it will automatically update the data in MS so that the logic complexity of the app layer can be further reduced. Such as:

But every time we don't read the data from the cache, we have to bother the database. In order to minimize the load pressure on the database, we can deploy database replication, use the slave database to complete the read operation, and the master database will always be responsible for only three things: 1. Update the data; 2. Synchronize the slave database; 3. Update the cache. Such as:

These cached web architectures are proven to be effective in real-world applications and can significantly reduce the load on the database while improving the performance of the Web. Of course, these architectures can also be adapted to the specific application environment to achieve optimal performance under different hardware conditions.

Application scenarios and implementation principles of Memcache

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More