A detailed description of memcached in Python (graphic)

Source: Internet
Author: User


Objective

Many Web applications save data to a relational database management system such as MySQL, where the application server reads the data and displays it in the browser. However, with the increase of data volume and the concentration of access, the burden of database, database response deterioration, site display delay and other adverse effects. Distributed caching is an important means of optimizing Web site performance, and a large number of sites provide large-scale hotspot data caching services through a scalable server cluster. The speed and scalability of dynamic Web applications can be significantly improved by caching database query results and reducing database access times. Redis, memcached, etc. are commonly used in the industry today to talk about how to use the memcached cache service in a Python project.

memcached Introduction

The memcached is an open-source, high-performance, distributed memory object caching system that can apply a variety of scenarios that require caching, primarily to speed up Web applications by reducing access to the database.
Memcached itself does not provide a distributed solution. On the server side, the memcached cluster environment is the accumulation of memcached servers, and the environment is relatively simple; the cache distribution is mainly implemented on the client side, and the distributed solution is achieved through the client's route processing. The principle of client-side routing is very simple, the application server in each access to the value of a key, through the routing algorithm to map key to a memcached server NodeA, so this key all operations on the NodeA. Cache hits are guaranteed as long as the data is cached by the server.

Routing algorithms

Simple routing algorithm

Simple routing algorithm, using remainder hash: The hash value of the cache Data key, divided by the number of servers, the remainder is the following table number of the server list. This algorithm allows the cache data to be distributed evenly throughout the memcached cluster, as well as for most cache routing needs.
However, when the memcached cluster is to be expanded, it can cause problems. For example, a Web site needs to expand 3 cache servers to 4 units. After changing the server list, if you still use the remainder hash, it is easy to calculate that 75% of requests cannot hit the cache. As the size of the server cluster increases, the rate at which you cannot hit is higher.

1%3 = 1    1%4 = 12%3 = 2    2%4 = 23%3 = 0    3%4 = 34%4 = 1    4%4 = 0# etc.

Such a large risk of expansion operations, the database may bring a lot of instantaneous pressure, and may even cause the database to crash. There are 2 ways to solve this problem: 1, to expand in the access trough, preheat the data after the expansion, 2, the use of better routing algorithm. At present, more consistent hash algorithms are used.

Consistent Hash

Memcached client can use the consistent hash algorithm as a routing strategy, compared to the general hash (such as simple modulo) algorithm, the consistent hash algorithm in addition to calculate the hash value of key, but also calculate the hash value of each server, These hashes are then mapped to a finite range of values (such as 0~2^32). Target server that stores the key data by looking for a minimum server with a hash value greater than hash (key). If it is not found, the server with the minimum hash value is directly used as the target server. At the same time, to some extent, solve the expansion problem, increase or delete a single node, for the whole cluster, there will be no big impact.

Virtual Layer

Consistent hashing is also not perfect, which can cause load imbalance when expanding. The latest version increases the design of the virtual node and further improves usability. In the expansion, more evenly affect the cluster of existing servers, evenly distributed load. Not detailed here.

Memory management

Storage mode

To improve performance, the data saved in memcached is stored in Memcached's built-in memory storage space. Because the data exists only in memory, restarting the memcached and restarting the operating system will cause all data to disappear. In addition, when the cached content capacity reaches the specified value, the unused cache is automatically deleted based on the LRU (Least recently used) algorithm. Memcached itself is a service designed for caching, so there is no permanent problem with data.

Memory structure

Memcached supports only the underlying Key-value key-value pairs for type data storage. There are two very important concepts in the memcached memory structure: Slab and chunk.
Slab is a block of memory that is the smallest unit of memory requested by memcached at a time. When you start memcached, you typically use the parameter-m to specify its available memory, but not all of the memory is allocated at the moment of startup, and you apply only when you need it, and each application must be a slab. The size of the slab is fixed at 1M (1048576 Byte), and a slab consists of several chunk of equal size. An item struct, a pair of keys, and value are saved in each chunk.

Although the size of chunk in the same slab is equal, but chunk in different slab is not necessarily equal, in memcached according to the size of chunk, slab can be divided into many kinds (class), By default memcached divides slab into 40 classes (CLASS1~CLASS40), and in Class 1, the chunk size is 80 bytes, because a slab is a fixed 1048576-byte (1M) size, Therefore, there can be up to 13,107 Chunk in Class1 (that is, this slab can save up to 13,107 Key-value data less than 80 bytes).

Memcached memory management to take pre-allocation, group management, group management is the slab class we mentioned above, according to the size of chunk slab is divided into many kinds. What is the memory pre-allocation process? When adding an item to memcached, Memcached first chooses the most appropriate slab class based on the size of the item: for example, the size of item is 190 bytes, and by default class 4 has a chunk size of 160 bytes, which is obviously inappropriate. The chunk size of Class 5 is 200 bytes, greater than 190 bytes, so the item will be placed in Class 5 (obviously there will be 10 bytes of waste is inevitable), after calculating the chunk to be put in, Memcached will check the size of the chunk also have no idle, if not, will apply for 1M (1 slab) space and divided into the kind of chunk. For example, when we first put a 190-byte item into the memcached, memcached produces a slab Class 2 (also called a page) and uses a chunk, the remaining 5,241 chunk to be used for the next size item. When we run out of all 5,242 chunk, the next time an item is added between the 160~200 bytes, memcached will again produce a Class 5 slab (so there are 2 pages).

Precautions

    • Chunk is divided in the page, and the page is fixed to 1m, so chunk maximum can not exceed 1m.

    • Chunk actually consumes 48B of memory, because the chunk data structure itself takes up 48B.

    • If the user data is greater than 1m, memcached will cut it and place it in multiple chunk.

    • The page that has been assigned cannot be reclaimed.

    • -For Key-value information, it is best not to exceed the size of 1m, at the same time the information length is relatively balanced and stable, so as to ensure maximum use of memory, at the same time, memcached used the LRU cleanup strategy, reasonable even expiration time, improve the hit ratio.

Usage Scenarios

Key-value can meet the requirements of the premise, the use of memcached distributed cluster is a better choice, the construction and operation of the use of a relatively simple; distributed cluster in a single point of failure, only affect a small number of data anomalies, currently can be magent cache proxy mode, make a single point of backup, improve the high availability The entire cache is memory-based, so the response time is very fast, no additional serialization, deserialization of the program, but also due to memory-based, data is not persisted, cluster failure restart data can not be recovered. The higher version of Memcached already supports the atomic operation of CAs mode and can solve concurrency control problems at a low cost.

Installation start

$ sudo apt-get install memcached$ memcached-m 32-p 11211-d# memcached will start memcached (-D) in the form of a daemon, allocate 32M memory (-M 32), and specify the listener 11211 port on localhost.

Python Operation memcached

In Python, the memcached can be manipulated through the Memcache library, which is simple enough to declare a client to read and write memcached cache.

Python access memcached

#!/usr/bin/env pythonimport MEMCACHEMC = memcache. Client ([' 127.0.0.1:12000 '],debug=0) mc.set ("Some_key", "some value") value = Mc.get ("Some_key") Mc.set ("Another_key", 3) Mc.delete ("Another_key") Mc.set ("Key", "1")   # Note that the key used for INCR/DECR must is a string.mc.incr ("key") MC. DECR ("key")

However, the python-memcached default routing policy does not use a consistent hash.

    def _get_server (self, key):        if Isinstance (key, tuple):            serverhash, key = key                    else:            Serverhash = Serverhashfunction (key)                    if not self.buckets:                        return None, none for        I in range (client._server_retries):            Server = self.buckets[serverhash% len (self.buckets)]                        if Server.connect ():                            # Print ("(using server%s)"% Server,)                return server, key            Serverhash = Serverhashfunction (str (serverhash) + str (i))                    return None, none

From the source can be seen: server = self.buckets[serverhash % len(self.buckets)] , just based on the key to a simple modulo. We can _get_server let python-memcached support a consistent hash by overriding the method.

Import memcacheimport typesfrom hash_ring import hashringclass memcachering (memcache.    Client): "" "Extends Python-memcache so it uses consistent hashing to distribute the keys. "" "Def init (self, servers, *k, **kw): self.hash_ring = hashring (servers) memcache. Client.init (self, servers, *k, **kw) self.server_mapping = {} for Server_uri, server_obj in Zip (serv        ERs, self.servers): Self.server_mapping[server_uri] = Server_obj def _get_server (self, key): If Type (key) = = types. Tupletype:return Memcache. Client._get_server (key) for I in Range (self._server_retries): iterator = Self.hash_ring.iterate_                                Nodes (key) for Server_uri in iterator:server_obj = Self.server_mapping[server_uri]                        If Server_obj.connect (): Return server_obj, key return None, none

Using memcached in Torando projects

The strategy used here is: 1. The application takes data from the cache, does not get it, takes the data from the database, succeeds, and puts it into the cache. 2. The application takes data from the cache and returns it. Cache update is a very complex problem, usually the first to store the data in the database, after successful, then let the cache invalidation. It will be written again later to discuss the issue of memcached cache updates separately.

Code

# coding:utf-8import Sysimport tornado.ioloopimport tornado.webimport loggingimport memcacheimport jsonimport urllib# Initialize memcache CLIENTMC = memcache. Client ([' 127.0.0.1:11211 '], debug=0) Mc_prefix = ' Demo ' class Basehandler (Tornado.web.RequestHandler): "" Abstract cache processing to Basehandler base class "" "Use_cache = False # Controls whether to use Cache def Format_args (self): arg_list = [] F Or a in self.request.arguments:for value in Self.request.arguments[a]: Arg_list.append (        '%s=%s '% (A, urllib.quote (Value.replace (', '))) # generates key Arg_list.sort () based on the requested URL Key = '%s?%s '% (Self.request.path, ' & '. Join (arg_list)) if arg_list else self.request.path key = '%s_%s '% (mc_  Prefix, key) # key is too long, does not cache processing if Len (key) > 250:logging.error (' Key out of length:%s ', Key) return None return key def get (self, *args, **kwargs): if self.            Use_cache:        Try: # get key Self.key = Self.format_args () on request                         If Self.key:data = Mc.get (self.key) # If the cache hits, the data is returned directly if:                                                Logging.info (' Get data from Memecahce ') self.finish (data)                        return except Exception, E:logging.exception (e) # if the cache is not hit, call Do_get to process the request, get the data = Self.do_get () Data_str = Json.dumps (data) # to get a successful , put the memcache cache if self. Use_cache and data and data.get (' result ',-1) = = 0 and Self.key:try:mc.set (Self.key, D Ata_str, except Exception, E:logging.exception (e) self.finish (data_s  TR) def do_get (self): return noneclass Demohandler (basehandler): Use_cache = True def do_get (self):      A = Self.get_argument (' A ', ' test ') b = self.get_argument (' B ', ' Test ') # access to the database for data, omitted here data = {' result ': 0, ' A ': A, ' B ': b} return Datadef Make_app (): Return Tornado.web.Application ([(R "/", D Emohandler),] if name = = "Main": Logging.basicconfig (Stream=sys.stdout, Level=logging.info, form at= '% (asctime) s% (Levelno) s% (message) s ',) app = Make_app () app.listen (8888) tornado.ioloop.i Oloop.current (). Start ()

Test results

In browser access http://127.0.0.1:8888/?a=1&b=3 , the terminal prints the log as follows:

2017-02-21 22:45:05,987 304 Get/?a=1&b=2 (127.0.0.1) 3.11ms2017-02-21 22:45:07,427 get data from memecahce2017-  02-21 22:45:07,427 304 Get/?a=1&b=2 (127.0.0.1) 0.71ms2017-02-21 22:45:10,350-Get/?a=1&b=3 (127.0.0.1) 0.82ms2017-02-21 22:45:13,586 get data from Memecahce

From the log, you can see the cache hit situation.

Summary

This article describes the basic concepts of memcached routing algorithms, memory management, usage scenarios, and then illustrates how to use the memcached cache in a Python project. The issue of cache updates also requires further analysis of the discussion.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.