Memcached operations in Python

Source: Internet
Author: User
By caching database query results, you can reduce the number of database accesses, significantly improving the speed and scalability of dynamic Web applications. Memcached and redis are commonly used in the industry. today we will talk about how to use memcached cache service in python projects.


Preface

Many Web applications store data in a relational database management system such as MySQL. the application server reads data from the data and displays the data in the browser. However, as the data volume increases and access is concentrated, the burden on the database increases, the database response deteriorates, and the website display delay may cause adverse effects. Distributed Cache is an important means to optimize website performance. a large number of sites provide large-scale hotspot data cache services through scalable server clusters. By caching database query results, you can reduce the number of database accesses, significantly improving the speed and scalability of dynamic Web applications. Redis and memcached are commonly used in the industry. today we will talk about how to use memcached cache service in python projects.

Introduction to memcached

Memcached is an open-source, high-performance, distributed memory object cache system that can be used in various cache scenarios. its main purpose is to accelerate web applications by reducing Database access.
Memcached itself does not provide distributed solutions. On the server side, the memcached cluster environment is actually the accumulation of memcached servers, and the environment is easy to build. the distributed cache is mainly implemented on the client, the distributed solution is achieved through client routing. The routing principle of the client is very simple. the application server maps the key to a memcached server nodeA by using the routing algorithm each time it accesses the value of a key, therefore, all operations on this key are performed on nodeA. As long as the server still caches the data, it can ensure that the cache hits.
However, when the memcached cluster is to be resized, it will cause problems. For example, a website needs to expand three cache servers to four. After the server list is changed, if the remainder hash is still used, it is easy to calculate that 75% of requests cannot hit the cache. The higher the server cluster scale, the higher the hit rate.

1% 3 = 1 1% 4 = 12% 3 = 2 2% 4 = 23% 3 = 0 3% 4 = 34% 4 = 1 4% 4 = 0 # and so on

In this way, the expansion operation is highly risky, which may put a lot of instantaneous pressure on the database, or even cause the database to crash. There are two ways to solve this problem: 1. scale up at a low access point and push data after resizing; 2. use a better routing algorithm. Currently, consistent Hash algorithms are used.

Consistent Hash

The memcached client can use the consistent hash algorithm as the routing policy. compared with the general hash (such as simple modulo) algorithm, the consistent hash algorithm not only calculates the hash value of the key, it also calculates the corresponding hash values of each server, and maps these hash values to a limited value range (such as 0 ~ 2 ^ 32 ). Find the smallest server with the hash value greater than the hash (key) as the target server that stores the key data. If not, the server with the minimum hash value is taken as the target server. At the same time, to a certain extent, the expansion problem is solved. adding or deleting a single node will not have a big impact on the entire cluster.

Memcached memory management adopts pre-allocation and group management. group management is the slab class we mentioned above. slab is divided into many types according to chunk size. What is the process of memory pre-allocation? When an item is added to memcached, memcached first selects the most appropriate slab class based on the item size. for example, the size of the item is 190 bytes, by default, the chunk size of class 4 is 160 bytes. the chunk size of class 5 is 200 bytes, which is greater than 190 bytes, therefore, this item will be placed in class 5 (it is inevitable that there will be 10 bytes of waste). after calculating the chunk to be put, memcached checks whether the chunks of this class are idle. if not, it applies for 1 M (1 slab) space and divides it into chunks of this type. For example, when we put a 190-byte item into memcached for the first time, memcached will generate an slab class 2 (also called a page) and use a chunk, the remaining 5241 chunks will be used for the next time there will be a suitable size item. when we use all the 5242 chunks, there will be another chunks in the range of 160 ~ When an item between 200 bytes is added, memcached will generate a class 5 slab again (so there are two pages ).

Notes
  • Chunk is divided in the page, and the page is fixed to 1 m, so the chunk cannot exceed 1 m.

  • The chunk actually occupies 48 B of memory, because the chunk data structure itself needs to occupy 48 B.

  • If the user data is larger than 1 MB, memcached will cut it and put it into multiple chunks.

  • Allocated pages cannot be recycled.

  • -For the key-value information, it is best not to exceed 1 MB. at the same time, the information length should be relatively balanced and stable to ensure maximum memory usage. at the same time, memcached adopts the LRU cleaning policy to improve the hit rate by reasonable or even Expiration Time.

Use cases

When key-value can meet the requirements, using memcached distributed cluster is a good choice and easy to set up and operate. when a single point of failure occurs in a distributed cluster, only a small amount of data exceptions are affected, currently, the Magent cache proxy mode can be used for single-point backup to improve high availability. the entire cache is based on memory, so the response time is fast, no additional serialization or deserialization programs are required. However, because the memory-based data is not persistent, the data cannot be restored after the cluster is restarted due to a fault. The later version of memcached supports atomic operations in CAS mode, which can solve concurrency control problems at a low cost.

Install and start
$ Sudo apt-get install memcached $ memcached-m 32-p 11211-d # memcached starts memcached (-d) as a daemon ), allocate 32 m memory (-m 32) for it and specify port 11211 for listening to localhost.
Use python to operate memcached

In python, you can use the memcache library to operate memcached. This library is easy to use. you can declare a client to read and write the memcached cache.

Accessing memcached using python
#!/usr/bin/env pythonimport memcachemc = memcache.Client(['127.0.0.1:12000'],debug=0)mc.set("some_key", "Some value")value = mc.get("some_key")mc.set("another_key", 3)mc.delete("another_key")mc.set("key", "1")   # note that the key used for incr/decr must be a string.mc.incr("key")mc.decr("key")

However, the default routing policy for python-memcached does not use consistent hashing.

    def _get_server(self, key):        if isinstance(key, tuple):            serverhash, key = key                    else:            serverhash = serverHashFunction(key)                    if not self.buckets:                        return None, None        for i in range(Client._SERVER_RETRIES):            server = self.buckets[serverhash % len(self.buckets)]                        if server.connect():                            # print("(using server %s)" % server,)                return server, key            serverhash = serverHashFunction(str(serverhash) + str(i))                    return None, None

From the source code, we can see that:server = self.buckets[serverhash % len(self.buckets)], But a simple modulo is performed based on the key. We can rewrite_get_serverMethod to enable python-memcached to support consistent hashing.

import memcacheimport typesfrom hash_ring import HashRingclass MemcacheRing(memcache.Client):    """Extends python-memcache so it uses consistent hashing to    distribute the keys.    """    def init(self, servers, *k, **kw):        self.hash_ring = HashRing(servers)        memcache.Client.init(self, servers, *k, **kw)        self.server_mapping = {}                for server_uri, server_obj in zip(servers, self.servers):            self.server_mapping[server_uri] = server_obj                def _get_server(self, key):        if type(key) == types.TupleType:                    return memcache.Client._get_server(key)                for i in range(self._SERVER_RETRIES):            iterator = self.hash_ring.iterate_nodes(key)                        for server_uri in iterator:                server_obj = self.server_mapping[server_uri]                                if server_obj.connect():                                    return server_obj, key                        return None, None
Use memcached in torando project

The policy is as follows: 1. the application obtains data from the cache first. if no data is obtained, the application retrieves data from the database. after the data is successfully obtained, it stores the data in the cache. 2. the application retrieves data from the cache and returns the result. Cache update is a complicated problem. generally, data is first stored in the database, and then the cache becomes invalid. Later, we will discuss the issue of memcached cache update separately.

Code
# Coding: utf-8import sysimport tornado. ioloopimport tornado. webimport loggingimport memcacheimport jsonimport urllib # Initialize memcache clientmc = memcache. client (['2014. 127. 0.0.1: 11211 '], debug = 0) mc_prefix = 'demo' class BaseHandler (tornado. web. requestHandler): "abstracts cache processing to BaseHandler base class" USE_CACHE = False # controls whether to use cache def format_args (self): arg_list = [] for a in self. request. arguments: for value in self. re Quest. arguments [a]: arg_list.append ('% s = % s' % (a, urllib. quote (value. replace ('','') # generate key arg_list.sort () key = '% s? % S' % (self. request. path ,'&'. join (arg_list) if arg_list else self. request. path key = '% s _ % s' % (mc_prefix, key) # The key is too long to be cached. if len (key)> 250: logging. error ('key out of length: % s', key) return None return key def get (self, * args, ** kwargs): if self. USE_CACHE: try: # obtain key self according to the request. key = self. format_args () if self. key: data = mc. get (self. key) # if the cache hits, data is directly returned if data: logging.info ('Get data from memecahce ') self. finish (data) return failed t Exception, e: logging. exception (e) # if the cache is not hit, call do_get to process the request and obtain data = self. do_get () data_str = json. dumps (data) # put the data obtained successfully into the memcache cache if self. USE_CACHE and data. get ('result',-1) = 0 and self. key: try: mc. set (self. key, data_str, 60) failed t Exception, e: logging. exception (e) self. finish (data_str) def do_get (self): return Noneclass DemoHandler (BaseHandler): USE_CACHE = True def do_get (self): a = self. get_argument ('A', 'test') B = self. get_argument ('B', 'test') # access the database to obtain data. data = {'result': 0, 'A': a, 'B ': b} return datadef make_app (): return tornado. web. application ([(r "/", DemoHandler),]) if name = "main": logging. basicConfig (stream = sys. stdout, level = logging. INFO, format = '% (asctime) s % (levelno) s % (message) s',) app = make_app () app. listen (1, 8888) tornado. ioloop. IOLoop. current (). start ()
Test results

Access through a browserhttp://127.0.0.1:8888/?a=1&b=3The log printed by the terminal is as follows:

2017-02-21 22:45:05,987 20 304 GET /?a=1&b=2 (127.0.0.1) 3.11ms2017-02-21 22:45:07,427 20 get data from memecahce2017-02-21 22:45:07,427 20 304 GET /?a=1&b=2 (127.0.0.1) 0.71ms2017-02-21 22:45:10,350 20 200 GET /?a=1&b=3 (127.0.0.1) 0.82ms2017-02-21 22:45:13,586 20 get data from memecahce

The log shows the cache hit.

Summary

This article introduces the basic concepts of memcached, such as routing algorithm, memory management, and application scenarios, and illustrates how to use memcached cache in the python project. The cache update issue needs further analysis and discussion.

The above is the detailed description of memcached operations in Python. For more information, see other related articles in the first PHP community!

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.