Memcached is a high-performance, distributed memory object caching system developed by Dangainteractive, and how best to use memcached to improve site performance? Let's have a look.
I. Introduction of Memcached
Memcached is often used to speed up the processing of applications, where we will focus on best practices for deploying it in applications and environments. This includes the flexible distribution of what should or should not be stored, how to handle the data, and how to adjust the methods used to update memcached and stored data. We'll also cover support for high-availability solutions, such as IBM WebSphere? EXtreme scale.
All applications, especially many Web applications, need to optimize the speed at which they access the client and return information to the client. However, in general, the same information is returned. Loading data from a data source (database or file system) is inefficient, especially if you run the same query every time you want to access that information.
While many Web servers can be configured to use caching to send back information, that is not compatible with the dynamic nature of most applications. And that's where memcached comes in. It provides a common memory storage that can hold anything, including objects in the local language, which allows you to store a wide variety of information and access it from many applications and environments.
Second, what is memcached?
Memcached is a set of distributed cache system, originally Danga Interactive for LiveJournal development, but is currently used by many software (such as MediaWiki). This is a set of open source software that is released with BSD license authorization.
Memcached lacks authentication and security controls, which means that the memcached server should be placed behind a firewall.
The Memcached API uses a 32-bit cyclic redundancy check (CRC-32) to calculate the key values and spread the data across different machines. When the table is full, the next additions will be replaced with the LRU mechanism. Since memcached is often used only as a cache system, applications that use memcached require additional code to update memcached data when writing back to a slower system, such as a back-end database.
Third, memcached suitable for what occasion?
In many cases, memcached have been abused, which is of course a complaint. I often see a post on the forum, similar to "How to improve efficiency", reply is "with memcached", as to how to use, where, used to do not have a sentence. Memcached is not omnipotent and it is not applicable on all occasions.
Memcached is a "distributed" Memory object caching system, that is, those that do not need to be "distributed", do not need to be shared, or simply small to a single server application, Memcached will not bring any benefits, but also slow down the system efficiency, because the network connection also requires resources, Even UNIX local connections are the same. As shown in my previous test data, the memcached local read and write speed is dozens of times times slower than the direct PHP memory array, and the APC, shared memory mode is almost the same as the direct array. It can be seen that the use of memcached is very uneconomical if it is just a local-level cache.
Memcached is often used as a database front-end cache. Because it is much less expensive than database SQL parsing, disk operations, and it is using memory to manage the data, it can provide better performance than directly read the database, in large systems, access to the same data is very frequent, memcached can greatly reduce the database pressure, Improve the efficiency of system execution. In addition, memcached is often used as a storage medium for data sharing between servers, such as storing data in the SSO system as a single-point login, which can be saved in memcached and shared by multiple applications.
It is important to note that Memcached uses memory to manage data, so it is volatile, when the server restarts, or the memcached process aborts, the data is lost, so memcached cannot be used to persist the data. Many people's misunderstanding, memcached performance is very good, good to the memory and hard disk contrast, in fact, memcached use memory does not get hundreds of read and write speed, its actual bottleneck is the network connection, it and the use of disk database system, the advantage is that it itself very "light "Because there is not too much overhead and direct reading and writing, it can easily cope with very large amounts of data exchanged, so there will often be two gigabit network bandwidth is full load, the memcached process itself does not occupy much CPU resources situation
Iv. Basic Knowledge
Memcached is an open source project designed to take advantage of the extra RAM in multiple servers to act as a memory cache that can hold frequently accessed information. The key here is to use the term cache: Memcached provides temporary storage in memory for information that is loaded from elsewhere.
Consider, for example, a typical web-based application. Even a dynamic site may have components or information constants that run through the entire life cycle of the page. Within a blog site, the list of categories for a single blog post is unlikely to change frequently between page views. It is relatively expensive to load this information through a query against the database each time, especially if the data has not changed. From Figure 1 You can see a page partition that can be cached within a blog site.
Figure 1: A typical blog page in a cacheable element
Put this structure on the other elements of the blog site, poster information, comments-Set the blog post itself-to infer, you can see that in order to display the contents of the home page is likely to occur 10-20 times the database query and formatting. Repeat this process for hundreds of or even thousands of of page views per day, so your servers and applications perform far more queries than are needed to display the content of the page.
By using memcached, formatting information that is loaded from a database can be stored as a format that can be used directly on a Web page. And because the information is loaded from the disk from RAM rather than through the database and other processing, access to information is almost instantaneous.
Again, memcached is a cache used to store common information, and with it, you don't have to load and process information from slow resources such as disks or databases.
The interface to the memcached is provided through a network connection. This means that you can share a single memcached server (or multiple servers, as shown later in this article) among multiple clients. This network interface is very fast, and in order to improve performance, the server intentionally does not support authentication or secure communication. However, this should not limit deployment options. The memcached server should exist inside your network. The practicality of the network interface and the ease with which multiple memcached instances can be deployed allows you to increase the overall size of your cache with extra RAM on multiple machines.
Five, storage method
The memcached storage method is a simple key/value pair, similar to a hash or associative array in many languages. By providing keys and values to store information in memcached, the information is recovered by pressing a specific key to request information.
The information is kept in the cache indefinitely unless the following happens:
Memory allocated for cache exhaustion-in this case, Memcached uses the LRU (least recently used) method to delete entries from this cache. Entries that have not been used recently are deleted from this cache, with the oldest first access.
Entries are explicitly deleted-the entries can always be deleted from within the cache.
Invalid entry expiration-Each entry has a valid period so that the information stored for this key can be purged from the cache when it is too old.
These conditions can be used in combination with the logic of your application to ensure that the information in the cache is up-to-date. With these basics in place, let's look at how the memcached can best be leveraged within the application.
Six, when to use memcached?
When you use memcached to improve application performance, you can modify some of the key processes and steps.
When loading the information, the typical scenario 2 is shown.
Figure 2: Typical order of loading the information to be displayed
Generally, these steps are:
Execute one or more queries to load information from a database
Formatting information appropriate for display (or further processing)
Use or display formatted data
In conjunction with this cache, the logic of the application can be slightly modified when using memcached:
Load information from the cache as much as possible
If present, the cached version of the information is used
If it does not exist:
1. Execute one or more queries to load information from the database
2. Formatting information suitable for display or further processing
3. Store the information in the cache
4, the use of formatted data
Figure 3 is a summary of these steps.
Figure 3: Loading information appropriate for display when using memcached
Data loading becomes a process of up to three steps, loading data from the cache or loading the data from the database (as appropriate) and storing it in the cache.
When the process first occurs, the data is loaded normally from the database or other data source and then stored in memcached. The next time you access this information, it pulls out of the memcached instead of loading it from the database, saving you a few minutes and CPU cycles.
Another aspect of the problem is to make sure that if you change the information that you want to store in memcached, update the memcached version at the same time that you update the backend information. This will cause a slight change in the typical order shown in Figure 4, as shown in Figure 5.
Figure 4. Update or store data within a typical application
Figure 5 shows the process that has changed since the use of memcached.
Figure 5: Updating or storing data when using memcached
For example, the blog site is still an example, when the blog system updates the list of categories in the database, the update should follow the following order:
Update a list of categories within a database
Store information in memcached
Returning information to the client
The storage operations within the memcached are atomic, so updates to the information do not allow the client to obtain only part of the data; they get either the old version or the new version.
For most applications, these two actions are the only thing you need to be aware of. When you access data that someone uses, it is automatically added to the cache and is automatically updated if changes are made to that data.
Vii. keys, namespaces, and values
Memcached Another important factor to consider is how to organize and name the data stored in the cache. From the example of the previous blog site, it is not difficult to see the need to use a consistent naming structure so that you can load the blog category, history, and other information, and then use it when loading information (and updating the cache) or when updating the data (and also updating the cache).
The specific naming system used is application-specific, but a structure similar to an existing application can often be used, and the structure is likely to be based on a unique identifier. This happens when you pull information from the database or when you organize the information set.
As an example of blog post, you can store a list of categories in an item with key category-list. A single post that corresponds to this post ID, such as blogpost-29-related values, can be used, and the comment for that item can be stored in blogcomments-29, where 29 is the ID of the blog post. In this way, you can store a wide variety of information in the cache, using different prefixes to identify the information.
The simplicity of the memcached key/value store (and lack of security) means that if you want to support multiple applications while using the same memcached server, you might consider using quantifiers in other formats to identify the data as belonging to a particular application. For example, you can add an application prefix such as blogapp:blogpost-29. These keys are not formatted, so you can use any string as the name of the key.
In terms of storing values, you should ensure that the information stored within the cache is appropriate for your application. For example, for this blog system, you might want to store the objects used by the blog application to format the blog information instead of the original HTML. This is more practical if the same infrastructure is used in multiple places within the application.
Eight, fill and use memcached
As an open source product and a product originally developed to work in an existing open source environment, memcached is supported by a number of environments and platforms. There are many interfaces for communicating with memcached servers, and often have multiple implementations for all languages. See resources for common libraries and toolboxes.
It is unlikely to list all supported interfaces and environments, but they all support the underlying APIs provided by the Memcached protocol. These descriptions have been simplified and applied within the context of different languages, where different values can be used to indicate errors. The main functions are:
Get (key)-gets information from a memcached that stores a specific key. If the key does not exist, an error is returned.
Set (key, value [, expiry])-stores this specific value with the identity key within the cache. If the key already exists, it will be updated. The expiry time is in seconds, and if the value is less than 30 days (30*24*60*60), then it is used as the relative time, and if the value is greater than 30 days, it is used as the absolute time (epoch).
Add (key, value [, expiry])-adds the key to the cache if the key does not exist, and returns an error if the key already exists. This function is useful if you want to explicitly add a new key without updating it because it already exists.
Replace (key, value [, expiry])-Updates the value of this particular key and returns an error if the key does not exist.
Delete (key [, TIME])-Removes this key/value pair from the cache. If you provide a time, then adding a new value with this key will be blocked for that particular period. Timeouts allow you to ensure that this value is always re-read from your datacenter.
INCR (key [, value])-increment 1 or a specific value for a specific key. Applies only to numeric values.
DECR (key [, value])-for a specific key minus 1 or a specific value, applies only to numeric values.
Flush_all-makes all current entries in the cache invalid (or expire).
For example, within Perl, the basic set operation can be handled as shown in Listing 1.
Ix. Elasticity and availability
One of the most common questions about memcached is: "What happens if the cache is not available?" As stated in the previous section, the information in the cache should not be the only resource for the information. You must be able to load data stored in the cache from other locations.
Although the inability to access information from the cache slows the performance of the application, it should not prevent the application from running. There are several scenarios that may occur:
If the memcached service is down, the application should fall back to the formatting required to load information from the original data source and display the information. This application should also continue to attempt to download and store information on memcached Nega.
Once the memcached server is available, the application should automatically attempt to store the data. There is no need to force overloading of cached data, and you can use standard access to load and populate the cache with information. Eventually, the cache will be re-populated with the most commonly used data.
Again, memcached is a cache of information but not the only source of data. memcached server unavailability should not be the end of the application, although this means that performance will be degraded before the memcached server returns to normal. In fact, the memcached server is relatively simple, and although it is not absolutely fault-free, its simplicity results in that it rarely goes wrong.
X. Allocation of Caches
The memcached server is just a cache for some key store values on the network. If you have more than one machine, you will naturally want to set up an instance of memcached on all the redundant machines to provide a large networked RAM cache storage.
With this idea, there is also a need to use some sort of allocation or replication mechanism to copy key/value pairs between machines. The problem with this approach is that if you do this, you will reduce the available RAM cache instead of increasing it. As shown in 6, you can see that there are three application servers, each of which can access a memcached instance.
Figure 6: Incorrect use of multiple memcached instances
Although each memcached instance is 1 GB in size (resulting in a 3 GB RAM cache), if each application server has its own cache (or if there is data replication between memcached), then the entire installation can still have only 1 GB of cache replicated between each instance 。
Because memcached provides information through a network interface, a single client can access data from any of the memcached instances it can access. If the data is not replicated across each instance, then eventually on each application server, you can have 3 GB of RAM cache available, as shown in 7.
Figure 7: Correct use of multiple memcached instances
The problem with this approach is choosing which server to store the key/value pairs, and how to decide which memcached server to talk to when you want to regain a value. The solution to the problem is to ignore complex things, such as looking up a table, or looking at a memcached server to handle the process for you. The memcached client, however, must strive to be simple.
The memcached client does not have to decide this information, it simply uses a simple hashing algorithm for the key specified when storing the information. When you want to store or get information from a list of memcached servers, the memcached client obtains a value from this key using a consistent hashing algorithm. For example, the key MyKey is converted to a value of 23875. Whether to save or get information does not matter, this key will always be used as a unique identifier to load from the memcached server, so in this case, the value of the "MyKey" hash conversion is always 23875.
If there are two servers, then the memcached client will perform a simple operation (for example, a coefficient) on this value to determine whether it should store the values on the first or second configured memcached instance.
When a value is stored, the client has the opportunity to determine the hash value from this key and on which server it was originally stored. When a value is obtained, the client determines the same hash value from this key and selects the same server to obtain the information.
If you are using the same server list (and in the same order) on each application server, each application server will select the same server when you need to save or retrieve the same key. Now, in this example, there is 3GB of memcached space that can be shared instead of the same 1 GB of space for replication, which leads to more available caches and is likely to improve the performance of applications with multiple users.
Xi. How can I not use memcached?
Although memcached is simple, memcached instances can sometimes be used incorrectly.
Memcached is not a database
The most common misuse of memcached is to use it as a data store, rather than as a cache. The primary purpose of memcached is to speed up the response time of the data, otherwise the data will take a long time to build or recover from other data sources. A typical example is recovering information from a database, especially if the information needs to be formatted or processed before it is displayed to the user. Memcached is designed to store information in memory to avoid repeating the same tasks every time the data needs to be restored.
You must not use memcached as the only source of information needed to run the application, and data should always be available from other sources of information. Also, remember that memcached is just a key/value store. The query cannot be executed on the data, or the content can be iterated to extract information. It should be used to store data blocks or objects for bulk use.
Do not cache database rows or files
Although you can use memcached storage to load data rows from a database, this is actually a query cache, and most databases provide a mechanism for their own query caching. The same is the case with other objects, such as the file system's image or file. Many applications and Web servers already have some good solutions for this kind of work.
If you use it to store all information blocks after loading and formatting, you can get more utility and performance improvements from memcached. Still, as an example of our blog site, the best place to store information is to format the blog category as an object, even after formatting it into HTML. The construction of the blog page can be done by loading individual components from memcached (such as blog post, category list, post history, etc.) and writing the completed HTML back to the client.
Memcached is not safe.
To ensure optimal performance, memcached does not provide any form of security, no authentication, and no encryption. This means that access to the memcached server should be handled as follows: first, by placing them on the same private side of the application deployment environment, and, second, using UNIX if security is required? Socket and only allow applications on the current host to access this memcached server.
This sacrifices some flexibility and resiliency, as well as the ability to share RAM caches across multiple machines on the network, but this is the only one by one solutions to ensure memcached data security in the current situation.
12. Do not limit yourself
In addition to situations where memcached instances should not be used, the flexibility of memcached should not be overlooked. Because memcached is at the same schema level as the application, it is easy to integrate and connect to it. And it's not complicated to change the application to take advantage of memcached. In addition, because memcached is just a cache, it does not stop the execution of the application when a problem occurs. If used correctly, it does this by reducing the load on the rest of the server infrastructure (reducing read operations to databases and data sources), which means that more clients can be supported without additional hardware.
But keep in mind that it's just a cache!
In this article, we learned about memcached and how best to use it. We see how information is stored, how to choose a reasonable key, and how to choose which information to store. We also discussed some of the key deployment issues for all memcached users, including the use of multiple servers, what to do when the memcached instance dies, and, perhaps most importantly, the circumstances in which the memcached cannot be used.
As an open-source application and a simple and straightforward application, memcached's functionality and practicality come from this simplicity. Memcached can be integrated into a wide variety of installations and environments by providing huge amount of RAM storage space for information, making it available on the network, and then allowing it to be accessed through a variety of interfaces and languages.
How best to use memcached?