How to use memcached in Python to reduce the number of database queries

Source: Internet
Author: User
This article describes how to use memcached in Python to reduce the number of database queries. memcached is a distributed memory cache tool that can reduce the number of hard disk I/O operations, for more information about how to optimize the webpage performance, I have never known how to compare the rendering speed of similar python and php pages, unexpectedly, I found a very simple and stupid method that I never found (I had to BS myself): Just like some php applications such as the Discuz Forum, on the generated webpage, the "how many seconds does this page take to be generated?" is displayed. when you continuously access the webpage for testing, you can intuitively find the operation that will cause the bottleneck, how to solve the bottleneck.

So I found that it would take about 0.2 seconds for SimpleCD to generate a home page unexpectedly. it really cannot be tolerated: the average generation of the Discuz Forum home page is only 0.02 seconds, the Discuz Forum's homepage page is undoubtedly much more complex than SimpleCD's homepage. What makes me feel like, because it is not a gap caused by the Python language, it can only be said that I have not done any optimization and the Discuz program is optimized well.


In fact, the database is dragging down without analysis. SimpleCD needs to perform more than 42 queries in the three databases of sqlite when generating the homepage, it is an extremely inefficient design caused by historical reasons. However, most of these 40 queries are actually very fast. after careful analysis, two of them are high-performance users, and the others are not slow.

The first large user is: the number of data retrieved

SELECT count(*) FROM verycd

This operation takes a lot of time every time. This is because the number of primary key statistics is locked and then traversed every time the database is locked. the larger the data volume, the larger the time consumed is O (N ), N indicates the size of the database. it is very easy to solve this problem. you only need to modify the number of current data when adding or deleting data, so the time is O (1 ).

The second largest user is: Get the latest 20 data lists

SELECT verycdid,title,brief,updtime FROM verycd   ORDER BY updtime DESC LIMIT 20;

Because the index is performed on updtime, the actual query time is the time for searching the index. Why is this operation slow? Because my data is inserted according to publish time and displayed by update time, I/O must be performed in at least 20 different places, which slows down. The solution is to make it I/O in one place. That is, the returned results of this statement are cached unless the database adds new data or changes the original data. In this way, it is faster than 20 times :)

The next is 20 small cases: getting publisher and clicks

SELECT owner FROM LOCK WHERE id=XXXX; SELECT hits FROM stat WHERE id=XXXX;

Why can't I use SQL join statements to save time? Because of the architecture, the data is stored in different databases. stat is a database with a click-through rate and is stored in mysql because frequent inserts are required. lock and verycd are databases that require a large number of select operations, the index usage and paging efficiency of mysql are stored in the sqlite3 data base, so join -.-

In short, this is not a problem. it is cached in the same way as the preceding solution.

In my example, to optimize the web page performance, you can simply query the cache database. I believe most Web applications are like this :)


Finally, it's memcached's turn. since we plan to cache and use files for caching, we still have disk I/O. It's better to cache it directly into the memory, and the memory I/O will be much faster. So memcached is just like this.

Memcached is a powerful tool because it supports distributed shared memory cache, which can be used by large sites. for small sites, as long as the memory can be generated, this is also a good thing; the size of the memory buffer required by the home page is estimated to be no more than 10 kb. What's more, I am also a local memory master. do you still care about this?

Configuration run: because the configuration is not properly configured for a single machine, you can modify the memory and port.

vi /etc/memcached.conf /etc/init.d/memcached restart

Used in python web applications

import memcache mc = memcache.Client(['127.0.0.1:11211'], debug=0)

Memcache is actually a map structure, and two functions are most commonly used:

  1. The first one is set (key, value, timeout). It is very simple to map the key to value. timeout indicates when the ing fails.
  2. The second is the get (key) function, which returns the value pointed to by the key.

So you can do this for a normal SQL query.

SQL = 'SELECT count (*) from verycd 'C = sqlite3.connect ('verycd. db '). cursor () # The original processing method c.exe cute (SQL) count = c. fetchone () [0] # Current processing method from hashlib import md5 key = md5 (SQL) count = mc. get (key) if not count: c.exe cute (SQL) count = c. fetchone () [0] mc. set (key, count, 60*5) # save for 5 minutes

Among them, md5 is used to make the key distribution more even. other code is intuitive and I will not explain it.


After statement 1 and statement 2 are optimized, the average generation time of the home page has been reduced to 0.02 seconds, and the discuz has an order of magnitude. After statement 3 is optimized, the final result is that the homepage generation time is reduced to about 0.006 seconds. after several lines of memcached code optimization, the performance is improved by 3300%. Finally, we can look at Discuz with a straight waist)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.