Typical efficient distributed database cache Solution

Source: Internet
Author: User

Original article: http://topic.csdn.net/u/20080722/22/3a63114f-31ea-4174-ba9f-0c0d0c8cb293.html

Why cache? If you ask this question, it means you are a newbie. After all, the database throughput is limited, and 5000 reads and writes per second are amazing. If you do not need to cache, assume that there are 100 database operations on a page, the concurrent databases of 50 users are closed, so that up to 50*3600*15 = 2.7 million PVS can be supported, and the database server is too tired to be exhausted. My cache system is more powerful than memcached alone for caching. It is equivalent to two levels of caching on memcached. We all know that memcached is very strong, but the throughput is still limited, 20000 get and put times per second. When encountering ultra-large scale applications, the local HashMap can execute hundreds of thousands of put and get times per second. Therefore, the loss of performance is almost negligible. Tip: Do not use distributed mode when distributed mode is not needed. Instead, use memcached instead. My cache system has already been implemented in this aspect, you can change the configuration. If you are interested, you can test it carefully!

Generally, there are four types of database cache in my opinion.

First, the cache of a single object (an object is a row of records in the database). For the cache of a single object, use HashMap. A little more complicated, use the LRU algorithm to package a HashMap, it's easy to use memcached for a more complex distributed architecture;

Type 2: List cache, like the list of posts in the Forum;

Third: the length of the cache, such as the number of posts in a forum, to facilitate paging.

Type 4: complex query of group, sum, and count. For example, the most popular post list ranked by clicks in a forum.

The first method is better to implement and the last three methods are more difficult. There seems to be no common solution. I will analyze the list cache (the second method) for the time being.

 

At the underlying layer of mysql and hibernate, the list results are cached Based on the query conditions during General List caching. However, as long as there are any changes to the table's records (add, delete, or modify ), the list cache should be cleared, so that as long as the record of a table changes frequently (usually this way), the list cache will almost fail, and the hit rate will be too low.

I have figured out a way to improve the List cache. When the table's records change, I traverse all list caches. Only those affected list caches will be deleted, instead of directly clearing all list caches, for example, adding a post in a forum version (id = 1), you only need to clear the list cache corresponding to the version id = 1, version id = 2. This processing has the advantage of caching the list cache of various query conditions (such as equal to, greater than, not equal to, less than), but also has a potential performance problem because of the need to traverse, if the maximum length of the list cache is set to 10000, the two 4-core CPUs can only traverse more than 300 times per second, in this way, if there are more than 300 insert/update/delete operations per second, the system will not be able to afford them.

In the case that the previous two solutions are not perfect, after several weeks of thinking, this person and his colleagues finally come up with a way to cache hash based on some fields in the table, this method does not require large-scale traversal, so the CPU compliance is very small. Because the list cache is hashed by field, the hit rate is extremely high. The idea is as follows: each table has three cached maps (key = value key-value pairs). The first Map is object cache A. In A, the key is the database id, value is the database object (that is, a row of data); the second Map is the General List cache B, the maximum length of B is generally about 1000, in B, key is the String (such as start = 0, length = 15 # active = 0 # state = 0) spelled out by the query condition. Value is a List composed of All IDS under the query condition; the third Map is hash cache C. In C, the key is a hash field (for example, a key is a String such as userId = 109) string, value is a HashMap similar to B. Here, only Map B needs to be traversed. If I don't know the meaning, I don't know it. After reading this example, I should understand it. I will use the reply table of the Forum to describe it, assume that the reply table T contains fields such as id, topicId, and postUserId (topicId is the post id, and postUserId is the publisher id ).

The first case is also the most common case, that is, to obtain the reply corresponding to a post. The SQL statement should be like
Select id from T where topicId = 2008 order by createTime desc limit
Select id from T where topicId = 2008 order by createTime desc limit 5
Select id from T where topicId = 2008 order by createTime desc limit
So it is obvious that topicid is the best way to cache the above three lists (which can be n) hashed to the map where the key is topicid = 2008. When the post with the ID of 2008 has a new reply, the system automatically clears the hash map with the key being topicid = 2008. Because this hash does not need to be traversed, it can be set to a large value, such as 100000, so that all the reply lists corresponding to the 0.1 million posts can be cached, when a post has a new response, the list of replies corresponding to the remaining 99999 posts does not change, and the cache hit rate is extremely high.

The second case is that the background needs to display the latest reply, and the SQL statement should be like
Select ID from t order by createtime DESC limit 0, 50
In this case, there is no need to hash, because there cannot be too many people to access the background, and there will not be too many common lists, so you can directly put them in the General List cache B.

In the third case, obtain a user's reply. The SQL statement is like
Select ID from t where userid = 2046 order by createtime DESC limit
Select ID from t where userid = 2046 order by createtime DESC limit
Select ID from t where userid = 2046 order by createtime DESC limit
This list is similar to the first one. Use userid as a hash.

In the fourth case, obtain a user's reply to a post. The SQL statement is like
Select ID from t where topicid = 2008 and userid = 2046 order by createtime DESC limit
Select ID from t where topicid = 2008 and userid = 2046 order by createtime DESC limit 15, 15
This situation is rare. Generally, it is subject to topicid = 2008 and put it in the hash map where the key is topicid = 2008.

The final cache structure should look like this:

Cache A is:
Key (long type) value (type T)
T object of 11 id = 11
22 id = 22 t object
133 id = 133 t object
......

List cache B is:
Key (string type) value (arraylist type)
From t order by createtime DESC limit, 50 arraylist, corresponding to All retrieved IDS
From t order by createtime DESC limit 50, 50 arraylist, corresponding to All retrieved IDS
From t order by createtime DESC limit, 50 arraylist, corresponding to All retrieved IDS
......

Hash cache C is:
Key (string type) value (hashmap)
Userid = 2046 key (string type) value (arraylist)
List composed of userid = 2046 # IDS
A list composed of userid = 2046 #5, 5 IDS
Userid = 2046 # list composed of 15, 5 IDS
......

Userid = 2047 key (string type) value (arraylist)
List composed of userid = 2047 # IDS
A list composed of userid = 2047 #5, 5 IDS
Userid = 2047 # list composed of 15, 5 IDS
......

Userid = 2048 key (string type) value (arraylist)
Userid = 2048 # topicid = 2008 # list composed of 0, 5 IDS
A list composed of userid = 2048 #5, 5 IDS
Userid = 2048 # list composed of 15, 5 IDS
......

......

Summary: This caching method can store large-scale lists with a high cache hit rate. Therefore, it can withstand ultra-large scale applications. However, technicians need to configure fields that need to be hashed based on their own business logic, generally, the index key of a table is used as a hash (note the order and put the most Scattered Fields in front). Assume that the userid is used as an example to store m lists of N users, if the data of a user changes, the list of other N-1 users remains unchanged. The above describes how to cache the list. The cache length is the same as that of the cache list. For example, the cache length is like select count (*) from t where topicid = 2008, it is also placed in the hash map of topicid = 2008. If MySQL memory tables and memcached are used together, and F5 devices are used for distributed load balancing, the system is sufficient to deal with applications of a scale such as 10 million IP addresses/day, in addition to search engines, general application websites cannot reach this scale.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.