Basic questions
1, the basic settings of memcached
1) Start the server side of the Memcache
#/usr/local/bin/memcached-d-M 10-u root-l 192.168.0.200-p 12000-c 256-p/tmp/memcached.pid
The-D option is to start a daemon,
-M is the amount of memory allocated to Memcache, in megabytes, I'm 10MB,
-U is the user running memcache, I am root here,
-L is the server IP address of the listener, if there are multiple addresses, I specify the server IP address 192.168.0.200,
-P is the port that sets Memcache listening, I set here 12000, preferably more than 1024 ports,
The-c option is the maximum number of concurrent connections to run, the default is 1024, I set the 256 here, according to the load of your server to set,
-P is set to save memcache PID file, I am here to save in/tmp/memcached.pid,
2) If you want to end the memcache process, execute:
# Kill ' Cat/tmp/memcached.pid '
The hashing algorithm maps a binary value of any length to a small, fixed-length binary value, a small binary value called a hash value. A hash value is a unique and extremely compact numeric representation of a piece of data. If you hash a clear text and even change only the
A letter of the paragraph, and subsequent hashes will produce different values. It is not possible to find two different inputs that hash the same value.
2, the purpose of the consistent hash algorithm has two points: first, the node changes after the other nodes are affected as small as possible, and the second is that the data redistribution is as balanced as possible after the node changes.
3, why do you want to run memcached?
Using memcached can reduce the pressure on the database if the site is high-traffic and most of the accesses result in a high database load.
4, the application of memcached business scenarios?
1) If the site contains a dynamic Web page with a large amount of traffic, the load on the database will be high. Because most database requests are read, memcached can significantly reduce the database load.
2) If the load on the database server is low but the CPU usage is high, you can cache the computed results (computed objects) and the rendered page template (enderred templates).
3) Use memcached to cache session data and temporary data to reduce write operations to their databases.
4) cache some small but frequently accessed files.
5) caching the results of web ' services ' (non-IBM advertised Web services, translator notes) or RSS feeds:
5, not applicable to memcached business scenarios?
1) The size of the cache object is greater than 1MB
Memcached itself is not designed to handle huge multimedia (large media) and huge binary blocks (streaming huge blobs).
2) key is longer than 250 characters
3) Virtual host does not let run memcached service
If the application itself is hosted on a low-end virtual private server, such virtualization technologies like VMware are not suitable for running memcached. Memcached need to take over and control large chunks of memory if Memcached manages the memory
Swapped out by OS or hypervisor, the performance of memcached will be greatly compromised.
4) application running in an unsafe environment
Memcached to provide any security policy, the memcached can be accessed only via Telnet. If your app is running on a shared system, you need to focus on security issues.
5) The business itself needs to be persistent data or need to be database
6. Can I traverse all the item in memcached?
No, this operation is relatively slow and blocks other operations (here, slower than memcached other commands). Memcached all non-debug (non-debug) commands, such as add, set, Get, Fulsh, and so on, regardless
How much data is stored in the memcached, and their execution consumes only constant time. The amount of time spent executing a command that traverses all of the item will increase as the volume of data in the memcached increases. When other commands are waiting for (traversing the
The command that has item is executed) and cannot be executed, thus blocking will occur.
Cluster-related issues
7. How does memcached work?
Memcached's high performance stems from a two-stage hash (two-stage hash) structure. Memcached is like a huge hash table that stores a lot of <key,value> pairs. With key, you can store or query arbitrary data. Client
Data can be stored on more than one memcached. When querying the data, the client first calculates the hash value (phase hash) of the key by referencing the node list, and the client sends the request to the selected node and
The memcached node finds the real data (item) and returns it to the client through an internal hashing algorithm (phase two hashes). From the implementation point of view, Memcached is a non-blocking, event-based server program.
8. What is the biggest advantage of memcached?
The biggest benefit of memcached is that it brings excellent levels of scalability, especially in a huge system. Since the client has made a hash for itself, it is easy to add a lot of memcached to the cluster. Memcached
There is no communication between each other, so there is no increase in memcached load, no multicast protocol, no network traffic explosion (implode).
9. What are the pros and cons of memcached versus MySQL's query cache?
Disadvantages:
1) It takes a lot of effort to introduce memcached into the application compared to MySQL's query cache. MySQL query cache, can automatically cache the results of SQL queries, the cached SQL query can be repeated, fast execution.
Advantages:
1) When the table is modified, the MySQL query cache is immediately flushed (flush). When write operations are frequent, MySQL's query cache often invalidates all cached data.
2) on multi-core CPUs, the MySQL query cache encounters a scaling problem (scalability issues). On multi-core CPUs, query cache adds a global lock, as more cache data needs to be refreshed and the speed
will become slower.
3) in MySQL's query cache, it is not possible to store arbitrary data (only SQL query results). With memcached, we can build a variety of efficient caches. For example, you can execute multiple independent queries and build a
User object, and then caches the user object in memcached. The query cache is at the level of the SQL statement and it is not possible to do so. In a small site, query cache helps, but as the size of the site
, query cache will do more harm than benefit.
4) The memory capacity that query cache can utilize is limited by the free memory space of the MySQL server. It is good to add more memory to the database server to cache the data. However, with memcached, as long as you have free internal
Can be used to increase the size of the memcached cluster, and then you can cache more data.
10. What are the pros and cons of memcached and the server's local cache (such as the APC, mmap files, etc.) of PHP?
1) First, the local cache faces severe memory limitations, and the amount of memory that can be exploited is limited by the free memory space of the (single) server.
2) The local cache is a bit better than memcached and query cache, which is that it can store arbitrary data without the latency of network access. Therefore, the local cache data query is faster. Consider putting highly
Common data is placed in the local cache. If you need to load a few small amounts of data per page, consider placing them in the local cached.
3) The local cache lacks the feature of collective failure (group invalidation). In a memcached cluster, deleting or updating a key will make all observers aware of it. However, in the local cache, we can only notify all servers
Flush the cache (very slow, not extensible) or just rely on the cache timeout invalidation mechanism.
11. What is the cache mechanism of memcached?
Memcached the main cache mechanism is the LRU (least recently used) algorithm + timeout failure. When you save data to memcached, you can specify how long the data can stay in the cache which is forever, or some time in the
Future If Memcached's memory is not enough, the expired slabs will be replaced first, then the oldest unused slabs.
12, memcached How to implement redundancy mechanism?
Not implemented! The memcached should be the cache layer of the application, from the design itself to Beijing without any redundancy mechanism. If a memcached node loses all of its data, it should be able to retrieve it from a data source (such as a database) again. Should
The system should be able to tolerate the failure of the node. If you are concerned that node failure can greatly increase the burden on your database, you can take some steps. For example, you can add more nodes (to reduce the impact of losing one node), and hot spare nodes (in other sections
When the point is down, take over the IP) and so on.
13, memcached How to handle fault-tolerant?
In the case of node failure, there is no need for the cluster to do any fault-tolerant processing. If a node fails, the measures to be taken depend entirely on the user.
When a node fails, here are a few scenarios to choose from:
1) Ignore it! There are many other nodes that can deal with the effect of node failure before the failed node is restored or replaced.
2) Remove the failed node from the list of nodes. Be careful with this operation! By default (the remainder hash algorithm), the client adds or removes nodes, causing all cached data to be unavailable! Because the list of nodes in the hash reference changes
, most keys are mapped to different nodes (as they were) because of the change in the hash value.
3) Start the hot standby node and take over the IP occupied by the failed node. This prevents hash disturbances (hashing chaos).
4) If you want to add and remove nodes without affecting the original hash results, you can use the consistent hashing algorithm (consistent hashing).
5) Two-time hash (reshing). When the client accesses the data, if one node is found down, the hash is done again (the hash algorithm differs from the previous one) and the other node is re-selected (note that the client does not
node is removed from the node list, the next possibility is to hash it first. If a node is good and bad, the two-hash method is risky, and dirty data may be present on both good and bad nodes.
14. How to import and export memcached item in bulk?
should not do so! The memcached is a non-blocking server. Any operation that could lead to a memcached pause or momentary denial of service should be worth pondering. Bulk importing data to memcached is often not what you really want
Of Imagine, if the cached data changes between export imports, you need to deal with dirty data, and if the cached data expires between export imports, what do you do with the data?
Therefore, exporting imported data in batches is not as useful as you might think. But it's very useful in a scene. Bulk import of cached data is helpful if you have a large amount of data that never changes and you want the cache to get hot (warm) quickly
Of
15, but I do need to memcached the item in bulk export import, how to do??
If you need to bulk export and import, the most likely cause is that rebuilding the cached data takes a long time or the database is bad and you suffer.
If a memcached node is down to make you miserable, you must do some optimization work on the database. such as dealing with "panic swarm" problem (memcached nodes are not valid, repeated queries to keep the database overwhelmed) or the existence of optimization does not
Good query and so on. Memcached is not an excuse and a solution to avoid optimizing queries.
Here are a few tips:
Use MogileFS (or similar software such as COUCHDB) to store the item, calculate the item and dump it on disk. The mogilefs can easily overwrite item and provide quick access to it. You can even put the item in the MogileFS
The cache is in memcached, which speeds up the read speed. The combination of mogilefs+memcached can speed up the response time of cache misses and improve the usability of the website.
Re-use MySQL. MySQL's InnoDB primary key query is very fast. If most of the cached data can be placed in a varchar field, the performance of the primary key query will be better. The query by key from Memcached is almost equivalent to
MySQL primary key query: Hashes the key to an integer of 64-bit, and then stores the data in MySQL. You can store the original (not hash) key in a normal field, and then build a two-level index to speed up the query ... key passively fails,
Delete the failed key in bulk, and so on.
16, memcached is how to do the authentication?
No identity authentication mechanism! Memcached is software that runs on the lower layer of the application (authentication should be the upper-level responsibility of the application). The client and server side of memcached is lightweight, in part because it does not implement identity verification at all
Certification mechanism. In this way, memcached can quickly create new connections without any configuration on the server side. If you want to restrict access, you can use a firewall, or let memcached listen to UNIX domain sockets.
17, memcached What is the multi-threading? How do I use them?
The thread is the law (threads rule)! With the efforts of Steven Grimm and Facebook, Memcached 1.2 and later have a multithreaded model. Multithreaded mode allows memcached to take full advantage of multiple CPUs and
All cached data is shared between the CPUs. Memcached uses a simple locking mechanism to guarantee mutual exclusion of data update operations. This is a more efficient way to process multi than running multiple memcached instances on the same physical machine
Gets If the load on your system is not heavy, you do not need to enable multithreaded working mode. If you are running a large web site with large-scale hardware, you will experience the benefits of seeing multithreading. For more information, see:
Http://code.sixapart.com/svn/memcached/trunk/server/doc/threads.txt.
Simply summarize: Command parsing (Memcached spends most of the time here) can run in multithreaded mode. memcached internal operations on data are based on a number of global locks (so this part of the work is not multi-threaded). Not
To improve the multithreading pattern, a large number of global locks will be removed, improving the performance of memcached in highly loaded scenarios.
18. What is the maximum length of key that memcached can accept?
The maximum length of a key that memcached can accept is 250 characters. It is important to note that 250 is a limitation within the memcached server side. If you are using a memcached client that supports the "key prefix" or a similar feature, then key
The maximum length (prefix + original key) can be more than 250 characters. It is recommended to use shorter keys, which saves memory and bandwidth.
19. What are the restrictions on the expiration time of item memcached?
The item object can expire up to 30 days in length. Memcached the incoming Expiration time (time period) is interpreted as a point in time, once at this point in time, memcached will put the item as a failed state, which is a simple but
The mechanism of obscure.
20, memcached maximum can store how big a single item?
The memcached can store a single item of 1MB maximum. If you need to cache more than 1MB of data, consider compressing or splitting the client into multiple keys.
21. Why is the size of a single item limited to 1M bytes?
Simple answer: Because the memory allocator algorithm is like this.
A detailed answer:
1) memcached memory storage engine, use slabs to manage memory. The memory is divided into slabs chunks of unequal size (first divided into equal size slabs, then each slab is divided into equal size chunks, slab of different chunk size
are not equal). The size of the chunk starts with a minimum number, and grows by a factor until the maximum possible value is reached. If the minimum value is 400B, the maximum value is 1MB, the factor is 1.20, and the size of each slab chunk is:
slab1-400b;slab2-480b;slab3-576b. Slab the larger the chunk, the greater the gap between it and the slab ahead. Therefore, the larger the maximum value, the less memory utilization. Memcached must be pre-allocated for each slab
If you set a smaller factor and a larger maximum value, you will need to provide more memory for memcached.
2) do not attempt to access large data in memcached, such as putting huge web pages into mencached. Because it takes a long time to load and unpack big data into memory, the performance of the system can be poor. If
You do need to store more than 1MB of data, you can modify the value of Slabs.c:power_block, and then recompile memcached, or use inefficient malloc/free. In addition, you can use the database, mogilefs and other scenarios instead
memcached system.
22. Can I use different sizes of cache space on various memcached nodes? If you do this, will memcached be able to use memory more efficiently?
The Memcache client only determines on which node a key is stored based on the hashing algorithm, regardless of the memory size of the node. Therefore, you can use memory of varying sizes on different nodes as the cache space. But you can do it in general.
: Multiple memcached instances can be run on nodes with more memory, each using the same memory as the instances on other nodes.
23, what is the binary protocol, do you need to pay attention?
The binary protocol attempts to provide a more efficient and reliable protocol for the end, reducing the CPU time generated by the client/server side due to processing protocols. According to Facebook's test, parsing the ASCII protocol is the most CPU-intensive time in memcached
Link.
24. How does the memcached memory allocator work? Why not apply malloc/free!? Why use slabs?
In fact, this is a compile-time option. The internal slab allocator is used by default, and the built-in slab allocator should indeed be used. At the earliest, memcached only used Malloc/free to manage memory. However, this approach cannot be associated with
The memory management of the OS worked well in the past. Repeated malloc/free caused memory fragmentation, and the OS eventually spent a lot of time looking for contiguous blocks of memory to meet malloc requests, rather than running the memcached process. The Slab dispenser is
was born to solve this problem. The memory is allocated and divided into chunks, which has been reused. Because memory is divided into slabs of different sizes, if the size of the item is not appropriate for the slab that is chosen to store it, some memory is wasted.
25. Is the memcached atomic?
All the individual commands that are sent to the memcached are completely atomic. If you send a set command and a GET command for the same data at the same time, they do not affect each other. They will be serialized and executed successively. Even in multi-threaded mode
, all commands are atomic. However, the command sequence is not atomic. If you first get an item by the GET command, modify it, and then set it back to memcached, the system does not guarantee that the item is not being
(process, which is not necessarily an operation in the operating system). Memcached 1.2.5 and later, the Get and CAS commands are available, and they solve the problem above. If you use the gets command to query the item for a key,
Memcached returns a unique identifier for the current value of the item. If the client program overwrite this item and want to write it back to memcached, you can send that unique identity together with the memcached by using the CAS command. If the item
The unique identity stored in the memcached is consistent with what you provide, and the write operation will succeed. If the item is also modified by another process during this time, the unique identity of the item stored in the memcached will change, and the write operation will
Failed.
Performance and client-side library issues
26, memcached Not my database fast, why?
In a one-to-one comparison, memcached may not have SQL queries fast. However, this is not the design goal of memcached. The goal of memcached is scalability. When connections and requests increase, the performance of the memcached will be higher than
Most databases are well-queried. You can test your code in a high-load environment (concurrent connections and requests) before deciding whether memcached is right for you.
27. Using different client libraries, can I access the same data in memcached?
Technically speaking, it's possible. However, you may encounter the following three issues:
1) Different libraries use different ways to serialize data. For example, Perl's cache::memcached uses storable to serialize complex data (such as hash references, objects, and so on). Client libraries in other languages are
Data in this format may not be readable. If you want to store complex data and want to be read by a variety of client libraries, you should store it in a simple string format that can be parsed by external libraries such as JSON, XML, and so on.
2) data from one client is compressed and not compressed from another client.
3) Each client library may use a different hashing algorithm (phase one hash). In the case of connecting to multiple memcached servers, the client library maps key to a memcached based on its own hashing algorithm. Precisely because
Different client libraries use different hashing algorithms, so the Perl client library is mapped to memcached a key and may be mapped to memcached B by the Python client library, and so on. The Perl client library also allows for each
memcached specifies different weights (weight), which is also a factor that causes the problem.
28. What is a consistent hash of the client?
Here is an article that explains its usefulness well: http://www.last.fm/user/RJ/journal/2007/04/10/392555.
The client can set a domain (namespace) for the key by "prefix". For example, in an environment where a shared host is used, the customer name can be prefixed to create a specific domain for the key. When storing data, the "prefix" can be used in
Key, but should not participate in the hash calculation. Currently, Memcached has not yet implemented a serialization method for complex structure data, and JSON is a widely used object serialization format.
Hash/Key Distribution
29. When will invalid data items be deleted from the cache?
memcached use lazy invalidation, when a client requests a data item, memcached checks the expiration time before returning the data to determine whether the data item has been invalidated. Similarly, when a new data item is added, if the cache is full, memcached replaces the defunct data item and then the least-used data item in the cache.
Name space
30. Memcached does not support namespaces. Here are a few ways to mimic namespaces:
1) Use the prefix of the key to imitate the namespace: add a meaningful prefix before the real key.
2) Delete a data item with a namespace: Although memcached does not support the use of any type of wildcard or namespace to complete a delete operation, there are some techniques that can be used instead:
Use a namespace called Foo in PHP: $ns _key = $memcache->get ("Foo_namespace_key");
If not set, initialize it
if ($ns _key=false) $memcache->set ("Foo_namespace_key", rand (1, 10000));
$my _key = "Foo_". $ns _key. " _12345 ";//Key to cache content
Clear namespace: $memcache->increment ("Foo_namespace_key"),//key increased by 1, using Memcached's LRU least-used key to replace
Application design
31, in the design of the application, you can cache those content through memcached?
1) Cache simple Query results: the query cache stores the entire result set corresponding to a given query statement, which is most appropriate to cache those SQL statements that are often used, but do not change, to query the result set, such as loading specific filtered content.
$key = MD5 (' SELECT * from Rest_of_sql_statement_goes_here ');
if ($memcache->get ($key)) {
' Return $memcache->get ($key); '
}else {
'//Run the query and transform the result data into your final dataset form '
' $result = $query _results_mangled_into_most_likely_an_array '
' $memcache->set ($key, $result, TRUE, 86400); Store The result of the query for a day '
' Return $result; '
}
Remember that if the result set of the query statement changes, the result set is not displayed. This method is not always useful, but it does make work faster.
2) Cache simple row-based query results: Row-based caching checks the list of key cache data, those that are in the cache can be fetched directly, rows not in the cache are fetched from the database and cached with a unique key, the most
Back into the final data set. Over time, most data is cached, which means that query statements are more likely to get rows of data from memcached than with databases. If the data is fairly static, we can
To set a longer cache time.
The row-based cache mode is particularly useful for this type of search : The dataset itself is large or the dataset is obtained from multiple tables, and the dataset depends on the input parameters of the query but there is a duplicate portion between the result set of the query.
For example, if you have a data set of User A, B, C, D, E. You click on a page that shows User A, B, E information. First, memcached get 3 different keys, each corresponding to a user to go to the cache to find, all not
Hit. Then we can get 3 users ' data rows in the database and cache them with SQL query.
Now you're going to click another page that shows the C, D, E information. When you go to find memcached, C, D's data is not hit, but we hit the E data. Then we get the row data of C, D from the database, the slow
exists in the memcached. Since then, no matter how these user information is arranged, any page about a, B, C, D, E information can get data from memcached.
3) cache not only SQL data, you can cache the final finished part of the display page, to save CPU computing time
For example, you are making a page that displays user information, you may get a message about the user (name, birthday, home address, Introduction), and then you may be able to convert the introductory information in XML format to HTML format or some other work
For Rather than storing these properties separately, you might prefer to store the rendered data blocks . Then you can simply take out the preprocessed HTML and populate it directly on the page, saving valuable CPU time.
32. Using tiered Caching
Memcached can process large amounts of cached data at high speed, but it still needs to consider maintaining a multi-tiered cache structure based on the system's situation. For example, in addition to the memcached cache, it can also be built with local cache (e.g. Ehcache, oscache, etc.)
Set up a multilevel cache. For example, you can cache some basic data in a local cache, such as small but frequently accessed data (such as product classifications, connection information, server state variables, application configuration variables, and so on), cache the data and make them as
It makes sense to approach the processor, which can help reduce the time it takes to generate the page and increase reliability in the event of a memcached failure.
33. Need to update cache when data is updated
Users edit their own information, when saving information to the database, you need to update the data in the cache or simply delete the old data. If you update the data right away, prevent the data from being read from the database that you just updated. When the user habitually re-
When you load your user information to confirm that the changes were successful, the data is taken out of the cache and they get the latest data.
34. Simulate the add command with lock
If you really need a lock, you can use the "add" command to mimic the function of the lock. Although it is not so useful in the case of misses, it is useful if you use it to cache the usual data (the metadata of the application server pool).
For example, you want to update the key A.
1. Add a "lock:a" key, this key has a duration of a few seconds to expire (long enough so that you can complete the calculation and update, not very long, because if the lock process is hung, this key will not be released immediately)
2. If the add operation succeeds, you have a lock: get the data of key A from the cache, change the data with the client program, update the data of the cache key A, delete the key "Lock:a". If you don't need to update it immediately, let it survive until it expires.
3. If the add operation fails, a lock has been acquired. Let the application do something appropriate, such as returning old data, waiting for a retry, or something else.
These actions are similar to MySQL setting the timeout value of Get_lock to 0. There is no way to simulate the timeout operation of Get_lock () in memcached through a mutex.
35. Warm up your cache
If you have a high-access site, and you are trying to join the recovery feature or other new features, you may end up with an empty cache problem. At first the cache is empty, and then a large group of people click on your site, populating the cache over
process, your database may not be able to withstand the stress. To solve this problem, you can try any feasible way to "warm up" your memcached. Method: You can write some scripts to cache the common page, or you can write a command line work
To populate the cache. You can populate the cache with some content at peak times.
Reference pages:
http://shwangking-126-com.iteye.com/blog/284937
Best Practice scenario for memcached (RPM)