A database caching solution that supports 10 million PV

Last Update:2014-12-12 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Intermediary transaction http://www.aliyun.com/zixun/aggregation/6858.html ">seo diagnose Taobao guest cloud host technology Hall

Willing to network support 10 million pv/database caching system

System is mainly built on the hibernate of efficient database caching system, which contains a distributed solution, the system has been used in the willing to online, did not find a big problem, I also believe that the system is strong enough to cope with millions of IP days of application is not a problem.

The code looks very simple, actually is two years experience summary, the whole process also encountered many difficulties, finally one by one solved. The system is very simple and easy to use, the main program Basemanager.java less than 1000 lines of code, with "lean" to describe the absolute, 1000 lines of code contains the database object cache, list and length of the cache, by field hash cache, update delay updates, Automatically clears the list caching function, uses it to realize like forum, the blog, the alumni, the friend community and so on most application website is enough.

Now get to the point ...

Why use caching? If you are still new to this question, database throughput is limited, read and write 5,000 times per second great, if not cache, assuming that a page has 100 database operations, 50 users concurrent database on the rest of the vegetables, so the most can support the PV also 50*3600*15= 2.7 million, and the database server tired to death, do not know when to be exhausted. My cache system is more powerful than the memcached cache alone, equivalent to a level two cache on memcached, we all know memcached is strong, but throughput is limited, 20,000 times per second get and put will still be off when faced with a super large application. , the local hashmap can execute millions of put and get per second, and the performance of this loss is almost negligible. Warm tip: Can not be distributed when not to use distributed, when not using distributed time to consider using memcached, my cache system in this respect has been achieved, change the configuration can be, interested can be carefully tested test!

The general database cache seems to me to contain four kinds. The first: a single object of the cache (one object is a database row Records), for a single object of the cache, with the HashMap can be a little more complex with LRU algorithm to package a hashmap, and then more complex distributed with memcached can, nothing too difficult; The second type: list caching, like a list of posts in a forum, and a third: length caching, such as how many posts are in a forum section, so that pagination can be easily implemented. Fourth: A complex group,sum,count query, such as a forum in the number of clicks ranked the most hot post list. The first is a better implementation, the latter three more difficult, there seems to be no general solution, I temporarily to list caching (second) as an example analysis.

The bottom of MySQL and Hibernate cache the list results based on query criteria when doing general list caching. But as long as the records of the table change (Add/delete/modify), the list cache is cleared, so that as long as the records of one table change frequently (as is often the case), The list cache is almost invalidated and the hit rate is too low.

I thought of a way to improve the list cache, when the record of the table is changed, traverse all the list cache, only those affected by the list cache will be deleted, rather than directly clear all the list cache, such as in a forum version (id=1) added a post, then just clear the id= 1 This version of the corresponding list of the cache can be, version of the id=2 will not be removed. This is a good way to cache various query criteria, such as equals, is greater than, not equal to, less than the list cache, but there is a potential performance problem, due to the need to traverse, the CPU is relatively large, if the list cache maximum length set to 10000, two 4 core CPU per second can only traverse over 300 times, So if there are more than 300 insert/update/delete per second, the system will be overwhelmed, blocked.

When the first two solutions are imperfect, I and my colleagues after a few weeks of thinking, finally came to the table based on a few fields to do hashing caching method, this approach without large-scale traversal, so the CPU is very small, because this list cache according to the field did hash, so the hit rate is extremely high. The idea is as follows: Each table has 3 cache map (key=value key value pairs), the first map is object cache A, in a, the key is the database Id,value is the database object (that is, a row of data), the second map is the Universal list cache b,b The maximum length of general 1000 or so, In B, key is a string (such as start=0,length=15#active=0#state=0) that the query condition spells out, value is a list of all IDs under the condition query, and the third map is the hash cache C, in C, A key is a hash field (such as a userid hash, where a key is a string such as userid=109) String,value is a similar hashmap to B. Only b This map is needed to traverse, do not know to understand that no, see the facet of this example should be clear, the reply table for the forum to explain, assume that the response table T in the assumption of fields such as Id,topicid,postuserid fields (TopicID is the ID of the post, Postuserid is the publisher ID).

The first and most commonly used situation is to get a post corresponding reply, the SQL statement should be like

Select ID from T where topicid=2008 order by createtime desc limit 0,5

Select ID from T where topicid=2008 order by createtime desc limit 5,5

Select ID from T where topicid=2008 order by createtime desc limit 10,5

Appearance, then this list is clearly with TopicID do hash is the best, the top three list cache (can be n) are hashed to the key is topicid=2008 this map, when the ID is 2008 of the Post has a new reply, the system automatically key is topicid= 2008 Hash map clears. Because the hash does not need to traverse, so can be set to large, such as 100000, so that all 100,000 posts corresponding to all the list of replies can be cached, when a post has a new reply, the remaining 99,999 posts corresponding to the reply list will not move, the cache hit rate is extremely high.

In the second case, the backend needs to display the most recent reply, and the SQL statement should be like

Select ID from T-order BY createtime DESC limit 0,50

, this situation does not need hashing, because the background can not be too many people access, the list is not too much, so directly into the universal list cache B.

In the third case, get a reply from a user, SQL statement like

Select ID from T where userid=2046 order by createtime desc limit 0,15

Select ID from T where userid=2046 order by createtime desc limit 15,15

Select ID from T where userid=2046 order by createtime desc limit 30,15

, then this list is similar to the first case, using UserID as a hash.

Fourth, get a user reply to a post, SQL statements like

Select ID from T where topicid=2008 and userid=2046 order by createtime desc limit 0,15

Select ID from T where topicid=2008 and userid=2046 order by createtime desc limit 15,15

Appearance, this kind of situation is relatively rare, generally take topicid=2008 as the standard, also put to key is topicid=2008 this hash map can.

The final caching structure should look like this:

Cache A is:

Key (long) value (type T)

11id=11 T Object

22id=22 T Object

133id=133 T Object

......

List Cache B is:

Key key (String type) value (ArrayList type)

From T-ORDER BY createtime desc limit 0,50arraylist, corresponding to all IDs taken out

From T-ORDER BY createtime desc limit 50,50arraylist, corresponding to all IDs taken out

From T-ORDER BY createtime desc limit 100,50arraylist, corresponding to all IDs taken out

......

The hash cache C is:

Key key (String type) value (HASHMAP)

Userid=2046key key (String type) value (ArrayList)

Userid=2046#0,5id consists of a list

Userid=2046#5,5id consists of a list

Userid=2046#15,5id consists of a list

......

Userid=2047key key (String type) value (ArrayList)

Userid=2047#0,5id consists of a list

Userid=2047#5,5id consists of a list

Userid=2047#15,5id consists of a list

......

Userid=2048key key (String type) value (ArrayList)

Userid=2048#topicid=2008#0,5id consists of a list

Userid=2048#5,5id consists of a list

Userid=2048#15,5id consists of a list

......

Summary: This caching mentality can store large lists, cache hit rate is very high, so can withstand large-scale applications, but need technicians according to their own business logic to configure the need to do hash of the field, generally use a table index key to do hashing (note order, the most scattered fields in front), assuming UserID as an example , you can store an m list of n users, and if a user's related data changes, the remaining N-1 user's list cache does not budge. All of the above shows how to cache the list, the cache length and the cached list mentality exactly the same, such as caching like Select COUNT (*) from T where topicid=2008, and also in the hash map of topicid=2008. If you work together to use MySQL's disassembly and memcached, plus the F5 device to do distributed load balancing, the system will be enough to deal with the scale-level applications like 10 million IP days.

If you feel good, please post the article to your blog or to collect this article, thank you. Thank you and luck.

QQ Group: 24561583

See example website: willing to download the source code

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More