A database caching solution that supports 10 million PV

Source: Internet
Author: User
Tags memcached

Willing network support 10 million pv/database cache system

System is mainly built on hibernate above the efficient database cache system, which contains a distributed solution, the system has been used in the online, no big problem, I also believe that the system is strong enough to deal with millions of ip/days of application is not a problem.

The code looks very simple, actually is two years experience summary, the whole process also encountered a lot of difficulties, the last one by one solved. The system is very simple and easy to use, the main program Basemanager.java less than 1000 lines of code, with "lean" to describe absolutely not, 1000 lines of code contains the database object cache, list and length of the cache, by field hash cache, update delay updates, Automatically clears the list cache and so on, uses it to realize like the forum, the blog, the alumni, the friend community and so on most application website is enough.

Now get to the chase .....

Why use caching? If you ask this question to explain that you are still novice, database throughput is limited, read and write 5,000 times a second great, if not cache, assuming that a page has 100 database operations, 50 users concurrent database on the dish, so the most can support the PV also 50*3600*15= 2.7 million, and the database server tired of half-dead, do not know when will be exhausted. My cache system than alone with memcached cache is also strong, equivalent to the memcached to do a two-level cache, we all know that memcached is strong, but the throughput is limited, 20,000 times per second get and put when encountering ultra-large-scale applications will still take a break , the local hashmap can execute millions of put and get every second, and the performance of this loss is negligible. Warm hint: Can not be distributed when not with the distributed, non-distributed time and then consider using memcached, my cache system in this area has been implemented, change a configuration can be, interested can carefully test the test!

The general database cache seems to me to contain four kinds. The first: a single object of the cache (an object is a database row of records), for a single object cache, with HashMap can be, a little more complicated with the LRU algorithm to package a hashmap, and then a more complex distributed memcached can, nothing too difficult; The second type: List cache, like the list of posts in the Forum, and the third: the length of the cache, such as how many posts in a forum section, so as to facilitate the implementation of paging. Fourth: A more complex group,sum,count query, such as a list of the most hot posts in a forum ranked by the number of clicks. The first kind of better implementation, the latter three is more difficult, there seems to be no general solution, I temporarily in the list cache (the second) for example analysis.

The bottom of MySQL and hibernate is to cache the list results based on the query criteria when doing a generic list cache, but as long as there are any changes (additions/deletions/modifications) to the table's records, the list cache is cleared so that as long as the records of a table change frequently (as is often the case), The list cache is almost dead and the hit rate is too low.

I think of a way to improve the list cache, when the table record changes, traversing all the list cache, only those affected by the list cache will be deleted, rather than clear all the list cache, for example, in a forum version (id=1) add a post, so long as the removal of id= 1 This version of the corresponding list cache can be, version id=2 will not have to clear. This processing has a benefit, you can cache various query conditions (such as equal, greater than, not equal to, less than) the list cache, but there is a potential performance problem, due to traverse, CPU compliance is relatively large, if the list cache maximum length set to 10000, two 4-core CPU per second can only traverse 300 times, So if there are more than 300 insert/update/delete per second, the system will be overwhelmed, blocked.

In the first two solutions are not perfect, I and colleagues after a few weeks of thinking, finally came to the table based on a number of fields to do a hash of the caching method, this method does not need a large-scale traversal, so the CPU is very small, because this list cache by the field hash, so the hit rate is very high. The idea is as follows: Each table has 3 cache map (key=value key value pairs), the first map is the object cache A, in a, key is the database Id,value is a database object (that is, a row of data), the second map is the General list cache B,b The maximum length is generally about 1000, In B, key is a string (such as start=0,length=15#active=0#state=0) that the query condition is spelled out, and value is a list of all the IDs under the condition query; The third map is hash cache C, in C, Key is a hash of the field (such as according to the UserID hash, where one key is userid=109 such a string) composed of String,value is a similar to B hashmap. Only b This map is need to traverse, do not know to say that there is no, see the example of a facet should be understood, using the Forum's reply table for instructions, assuming that the reply table T is assumed to have field Id,topicid,postuserid fields (TopicID is the ID of the post, Postuserid is the publisher ID).

The first and most common case is to get a response from a post, and the SQL statement should be like

Select ID from T where topicid=2008 order by createtime desc limit 0,5

Select ID from T where topicid=2008 order by createtime desc limit 5,5

Select ID from T where topicid=2008 order by createtime desc limit 10,5

, then this list is obviously used TopicID hash is the best, the above three list cache (can be N) hash to the key is topicid=2008 this map, when the ID is 2008 of the Post has a new reply, the system automatically put key is topicid= A hash map of 2008 can be cleared. Since this hash does not need to traverse, so can be set to a large, such as 100000, so that 100,000 posts corresponding to all the reply list can be cached, when a post has a new reply, the remaining 99,999 posts corresponding to the reply list will not move, the cache hit rate is extremely high.

The second situation is that the background needs to show the latest reply, and the SQL statement should be like

Select ID from T ORDER by createtime desc limit 0,50

, this situation does not need to hash, because the background can not have too many people to access, the list is not too many, so directly into the universal list cache B.

In the third case, get a user's reply, SQL statements like

Select ID from T where userid=2046 order by createtime desc limit 0,15

Select ID from T where userid=2046 order by createtime desc limit 15,15

Select ID from T where userid=2046 order by createtime desc limit 30,15

, then this list is similar to the first case, with the UserID hash.

In the fourth case, get a user's reply to a post, SQL statements like

Select ID from T where topicid=2008 and userid=2046 order by createtime desc limit 0,15

Select ID from T where topicid=2008 and userid=2046 order by createtime desc limit 15,15

Appearance, this situation is relatively rare, generally topicid=2008, also put to key is topicid=2008 this hash map can be.

So the final cache structure should look like this:

Cache A is:

Key (long) value (type T)

11id=11 's T Object

22id=22 's T Object

133id=133 's T Object

......

List Cache B is:

Key (String) value (type ArrayList)

From T ORDER BY createtime desc limit 0,50arraylist, corresponding to all IDs taken out

From T ORDER BY createtime desc limit 50,50arraylist, corresponding to all IDs taken out

From T ORDER BY createtime desc limit 100,50arraylist, corresponding to all IDs taken out

......

Hash Cache C is:

Key (String) value (HASHMAP)

Userid=2046key key (String type) value (ArrayList)

Userid=2046#0,5id composed of List

Userid=2046#5,5id composed of List

Userid=2046#15,5id composed of List

......

Userid=2047key key (String type) value (ArrayList)

Userid=2047#0,5id composed of List

Userid=2047#5,5id composed of List

Userid=2047#15,5id composed of List

......

Userid=2048key key (String type) value (ArrayList)

Userid=2048#topicid=2008#0,5id composed of List

Userid=2048#5,5id composed of List

Userid=2048#15,5id composed of List

......

......

Summary: This cache thinking can store large-scale lists, cache hit rate is very high, so can withstand the ultra-large-scale applications, but requires technical staff according to their business logic to configure the fields to be hashed, generally with a table index key to do the hash (note the order, the most scattered field in front), assuming that the userid as an example , you can store the M-type list of n users, and if the relevant data of one user changes, the remaining N-1 users ' list cache does not budge. The above description is how to cache the list, the cache length and the cache list is exactly the same, such as the cache like the Select COUNT (*) from T where topicid=2008 such length, is also placed in topicid=2008 this hash map. If you work with MySQL's split-table and memcached, and the F5 device does distributed load balancing, the system will be sufficient to deal with applications like 10 million ip/days of scale.

If you feel good, please post this article to your blog or bookmark this article, thank you


A database caching solution that supports 10 million PV

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.