Memcache Application Scenario Introduction

Source: Internet
Author: User
Tags greatest common divisor memcached

The problems facing

Database access bottlenecks have always been a headache for high-concurrency, high-accessibility Web applications. Especially when your program architecture is built in single-database mode, and the peak number of data pool connections has reached 500, then your program execution is not far from the edge of the crash. Very many small site developers began to focus on the product demand design, the lack of neglect of the overall program performance, scalability and other aspects of the consideration, the result of watching the amount of online crawling, you can suddenly find that one day the site because of the excessive amount of access to the crash. It's too late to cry. So we must make a rainy day. Before the database strikes. Find ways to lighten it. This is also the main topic of this article.

As we all know, when there is a request coming over. The webserver is given to Appserver,app to process and access the relevant data from the DB, but the cost of DB access is quite high. In particular, each time the same data is equal to make the database every time to do a high cost of useless. The database assumes that it speaks. I'm sure you'll grumble, you've asked so many times, can't you remember? Yes, suppose the app gets the first data to the memory and reads it directly from the memory the next time, without having to bother the database, so that it does not offload the database? and accessing data from within must be much faster than getting from the database media, but it improves the performance of the application.

Therefore, we can add a layer of cache layer between the Web/app layer and the DB layer, the main purpose is: 1. reduce database read burden; 2. Improves data read speed. Also, the cache accesses the media as memory. and the memory capacity of a server is generally limited, unlike hard disk capacity to achieve TB level.

So. Can consider the use of distributed cache layer. This makes it easier to break the limits of memory capacity. Added flexibility at the same time.

Memcached Introduction

Memcached is an open source distributed cache system, and today there are many large Web applications that contain Facebook,youtube,wikipedia. Yahoo, and so on, are using memcached to support their hundreds of millions of-level page access every day.

By integrating the cache layer with their web schemas, their applications significantly reduce the load on the database at the same time that they improve performance.

Detailed memcached information can be obtained directly from its official site [1]. Here I will simply give you a brief introduction of how memcached works:

The atoms processed by memcached are each (key,value) pair (hereinafter referred to as KV pair). Key is converted to Hash-key by a hash algorithm, making it easy to find, control, and hash as much as possible. At the same time, memcached is using a two-level hash. Maintained by a large hash table.

Memcached has two core components: Server (MS) and Client (MC), in a memcached query, MC First calculates the MS position of the kv pair by calculating the hash value of the key.

When MS is determined, the client sends a query request to the appropriate MS, allowing it to find the exact data.

Since there is no interaction and multicast protocol, the impact of memcached interaction on the network is minimized.

For example: Consider the following scenario, where there are three MC X, Y, Z, and three MS, each of which is a,b,c:

Set KV to x want to set key= "foo", value= "Seattle" x to get the MS List, and the key to do a hash conversion, according to the hash value to determine the location of the memory of the KV V is selected X connection B,b received the request, put (key= "foo", value= " Seattle ") saved up

Get KV pair Z want to key= "foo" value z with the same hash algorithm to calculate the hash value, and to determine the value of key= "foo" exists B on the z connection B, and from B to get Value= "Seattle" Other whatever from X. Y,z's request for the value of key= "foo" will be sent to B

Memcachedserver (MS)

Memory allocation

By default, MS is allocated memory with a built-in component called the "block allocator". Discards the malloc/free memory allocation of the C + + standard. The main purpose of the block allocator is to avoid memory fragmentation, otherwise the operating system spends a lot of time looking for these logically contiguous blocks of memory (actually disconnected). Using a block allocator, MS will take turns allocating large chunks of memory and reusing them continuously. Of course, because the size of the block is not the same. When the data size and block size do not match. It is still possible to cause a waste of memory.

At the same time, MS has a corresponding limit to key and data, the key cannot exceed 250 bytes in length, and data cannot exceed the block size limit---1MB. Because of the hash algorithm used by MC, the memory size of each MS is not taken into account. Theoretically, the MC allocates the equivalent kv pairs to each MS, assuming that each MS memory is not the same. That could lead to a reduction in memory usage. So an alternative solution is. Depending on the memory size of each Ms. Find out their greatest common divisor, and then open n capacity = Greatest common divisor instance on each MS, which equals to having multiple sub-MS with the same capacity size. This provides the overall memory utilization.

Caching policies

When the MS's hash table is full, the new insert data replaces the old data. The updated strategy is the LRU (least recently used), and the effective time limit for each kv pair. The KV-to-store effective time limit is set in the MC driven by app and is passed to MS as a parameter.

At the same time MS is used as a lazy alternative. Ms does not open additional processes to monitor obsolete kv pairs and remove them in real time. Instead, when and only when the new data is inserted, there is no extra space left. Before the cleanup action is performed.

Cache database Queries today, one of the most popular ways to use memcached is to cache database queries, and here's a simple sample description:

The app needs to get userid=xxx user information, and the corresponding query statements are similar:

"SELECT * from users WHERE userid = xxx"

The app asks the cache first, there is no "user:userid" (key definition can be pre-defined constraints) data, fake and return data; If not, the app reads the data from the database. And call the cache's add function to add the data to the cache.

When the data needs to be updated, the app calls the cache's update function to keep the database synchronized with the cache's data.

From the example above we can also find that once the data of the database is found to change, we must update the data in the cache in time to ensure that the app reads the correct data in sync. Of course, we can record the expiration time of the data in the cache by the timer, and the time will trigger the event to update the cache, but there is always a time delay. Causes the app to read dirty data from the cache, which is also known as a dog hole problem. (I'll specialize in describing the subject later on.)

Data redundancy and fault prevention

From the design point of view, memcached is no data redundancy link, it is a large-scale high-performance cache layer, increase data redundancy can bring only the complexity of design and increase the cost of the system.

After the data has been lost on an Ms. The app is still able to get data from the database.

It is only more prudent to provide additional MS to support the cache when some MS is not working properly. This will not cause the app to take too much load on the database at once because it doesn't get data from the cache.

At the same time in order to reduce the impact of a MS failure, the ability to use a "hot backup" solution, is to use a new MS to replace the problem of MS, of course, the new MS will still use the original MS IP address, big deal data once again loaded again.

The second way. is to increase the number of nodes in your MS, and then MC will detect the status of each node in real time. Assuming that a node is not responding for a long time, it is removed from the available server list in the MC, and another hash is positioned on the server node. Of course, the problem is that the original key is stored on B and stored on C. So the solution itself has its weaknesses, preferably in conjunction with a "hot backup" scenario. Can minimize the impact of the failure.

Memcachedclient (MC)

Memcachedclient has a variety of language version numbers for everyone to use. Contains Java. C. Php,.net and so on. See Memcached API Page[2] for details. You can choose the right client to integrate according to the needs of your project.

Cached Web application architectures are supported by caching. We can add the cache layer between the traditional app layer and the DB layer, and each appserver can bind to a single MC. Each data read can be obtained from MS, assuming no, and then read from the DB layer. And when the data is to be updated. In addition to sending the update SQL to the DB layer. The updated data is also sent to MC at the same time. Let MC to update the data in Ms.

If our database can communicate with MS in the future, it will be able to deliver the updated task to the DB layer uniformly. Each time the database updates the data at the same time, it will proactively update the data in MS, which can further reduce the logic complexity of the app layer. For example, with:

Just every time we assume that no data is read from the cache. All have to trouble the database.

To minimize the load pressure on the database, we were able to deploy database replication and use the slave database to complete the read operation, while the master database was always responsible for only three things: 1. Update data; 2. Synchronize the slave database; 3. Update the cache. For example, with:

These cached web architectures are proven to be effective in practical applications and can significantly reduce the load on the database at the same time and improve the performance of web execution.

Of course, these architectures can also be variant based on the detailed application environment to achieve optimal performance under different hardware conditions.

Memcache Application Scenario Introduction

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.