reduce read from database and data sources
The open source memcached tool is a cache for storing commonly used information, and with it, you do not have to load (and process) information from slow resources, such as disks or databases. The tool can be deployed in a dedicated case or as a way to run out of excess memory in an existing environment. Although memcached is easy to use, it is sometimes misused or used in the wrong environment type. In this article, learn the best time to use memcached.
0 Reviews:
Martin Brown, freelance writer, Freelance Developer
September 06, 2010
Develop and deploy your next application on the IBM Bluemix cloud platform.
Start your free trial now
Brief introduction
Memcached is often used to speed up the processing of applications, where we will focus on best practices for deploying it in applications and environments. This includes the flexible distribution of what should or should not be stored, how to handle the data, and how to adjust the methods used to update memcached and stored data. We'll also cover support for high-availability solutions, such as IBM websphere®extreme scale.
All applications, especially many Web applications, need to optimize the speed at which they access the client and return information to the client. However, in general, the same information is returned. Loading data from a data source (database or file system) is inefficient, especially if you run the same query every time you want to access that information.
While many Web servers can be configured to use caching to send back information, that is not compatible with the dynamic nature of most applications. And that's where memcached comes in. It provides a common memory storage that can hold anything, including objects in the local language, which allows you to store a wide variety of information and access it from many applications and environments.
Back to top of page
Basic knowledge
Memcached is an open source project designed to take advantage of the extra RAM in multiple servers to act as a memory cache that can hold frequently accessed information. The key here is to use the term cache : memcached provides temporary storage in memory for information that is loaded from elsewhere.
Consider, for example, a typical web-based application. Even a dynamic site may have components or information constants that run through the entire life cycle of the page. Within a blog site, the list of categories for a single blog post is unlikely to change frequently between page views. It is relatively expensive to load this information through a query against the database each time, especially if the data has not changed. From Figure 1 You can see a page partition that can be cached within a blog site.
Figure 1. A typical cache element within a blog page
Put this structure on the other elements of the blog site, poster information, comments-Set the blog post itself-to infer, you can see that in order to display the contents of the home page is likely to occur 10-20 times the database query and formatting. Repeat this process for hundreds of or even thousands of of page views per day, so your servers and applications perform far more queries than are needed to display the content of the page.
By using memcached, formatting information that is loaded from a database can be stored as a format that can be used directly on a Web page. And because the information is loaded from the disk from RAM rather than through the database and other processing, access to information is almost instantaneous.
Again, memcached is a cache used to store common information, and with it, you don't have to load and process information from slow resources such as disks or databases.
The interface to the memcached is provided through a network connection. This means that you can share a single memcached server (or multiple servers, as shown later in this article) among multiple clients. This network interface is very fast, and in order to improve performance, the server intentionally does not support authentication or secure communication. However, this should not limit deployment options. The memcached server should exist inside your network. The practicality of the network interface and the ease with which multiple memcached instances can be deployed allows you to increase the overall size of your cache with extra RAM on multiple machines.
The memcached storage method is a simple key/value pair, similar to a hash or associative array in many languages. By providing keys and values to store information in memcached, the information is recovered by pressing a specific key to request information.
The information is kept in the cache indefinitely unless the following happens:
- memory allocated for cache exhaustion -in this case, Memcached uses the LRU (least recently used) method to delete entries from this cache. Entries that have not been used recently are deleted from this cache, with the oldest first access.
- entries are explicitly deleted -the entries can always be deleted from within the cache.
- Invalid entry expiration -each entry has a valid period so that the information stored for this key can be purged from the cache when it is too old.
These conditions can be used in combination with the logic of your application to ensure that the information in the cache is up-to-date. With these basics in place, let's look at how the memcached can best be leveraged within the application.
Back to top of page
When to use memcached
When you use memcached to improve application performance, you can modify some of the key processes and steps.
When loading the information, the typical scenario 2 is shown.
Figure 2. Load the typical order of information to be displayed
Generally, these steps are:
- Execute one or more queries to load information from a database
- Formatting information appropriate for display (or further processing)
- Use or display formatted data
In conjunction with this cache, the logic of the application can be slightly modified when using memcached:
- Load information from the cache as much as possible
- If present, the cached version of the information is used
- If it does not exist:
- Execute one or more queries to load information from a database
- Formatting information that is appropriate for display or further processing
- Storing information in the cache
- Using formatted data
Figure 3 is a summary of these steps.
Figure 3. Loading information appropriate for display when using memcached
Data loading becomes a process of up to three steps, loading data from the cache or loading the data from the database (as appropriate) and storing it in the cache.
When the process first occurs, the data is loaded normally from the database or other data source and then stored in memcached. The next time you access this information, it pulls out of the memcached instead of loading it from the database, saving you a few minutes and CPU cycles.
Another aspect of the problem is to make sure that if you change the information that you want to store in memcached, update the memcached version at the same time that you update the backend information. This will cause a slight change in the typical order shown in Figure 4, as shown in Figure 5.
Figure 4. Update or store data within a typical application
Figure 5 shows the process that has changed since the use of memcached.
Figure 5. Update or store data when using memcached
For example, the blog site is still an example, when the blog system updates the list of categories in the database, the update should follow the following order:
- Update a list of categories within a database
- formatting information
- Store information in memcached
- Returning information to the client
The storage operations within the memcached are atomic, so updates to the information do not allow the client to obtain only part of the data; they get either the old version or the new version.
For most applications, these two actions are the only thing you need to be aware of. When you access data that someone uses, it is automatically added to the cache and is automatically updated if changes are made to that data.
Back to top of page
Keys, namespaces, and values
Memcached Another important factor to consider is how to organize and name the data stored in the cache. From the example of the previous blog site, it is not difficult to see the need to use a consistent naming structure so that you can load the blog category, history, and other information, and then use it when loading information (and updating the cache) or when updating the data (and also updating the cache).
The specific naming system used is application-specific, but a structure similar to an existing application can often be used, and the structure is likely to be based on a unique identifier. This happens when you pull information from the database or when you organize the information set.
As an example of blog post, you can store a list of categories in an item with keys category-list
. A single post that corresponds to this post ID, such as a blogpost-29
related value, can be used, and the comment for that item can be stored blogcomments-29
inside, where the ID of this blog post is. In this way, you can store a wide variety of information in the cache, using different prefixes to identify the information.
The simplicity of the memcached key/value store (and lack of security) means that if you want to support multiple applications while using the same memcached server, you might consider using quantifiers in other formats to identify the data as belonging to a particular application. For example, you can add blogapp:blogpost-29
an application prefix like this. These keys are not formatted, so you can use any string as the name of the key.
In terms of storing values, you should ensure that the information stored within the cache is appropriate for your application. For example, for this blog system, you might want to store the objects used by the blog application to format the blog information instead of the original HTML. This is more practical if the same infrastructure is used in multiple places within the application.
Most language interfaces, including Java™, Perl, PHP, and so on, can serialize language objects for storage within memcached. This allows you to store and then recover all objects from the memory store, rather than manually refactoring them within your application. Many objects, or structures they use, are based on some sort of hash or array structure. For cross-language environments, such as sharing the same information between the JSP environment and the JAVASCRIPT environment, you can use a schema-neutral format, such as JavaScript Object Notation (JSON) or even XML.
Back to top of page
Fill and use memcached
As an open source product and a product originally developed to work in an existing open source environment, memcached is supported by a number of environments and platforms. There are many interfaces for communicating with memcached servers, and often have multiple implementations for all languages. See resources for common libraries and toolboxes.
It is unlikely to list all supported interfaces and environments, but they all support the underlying APIs provided by the Memcached protocol. These descriptions have been simplified and applied within the context of different languages, where different values can be used to indicate errors. The main functions are:
get(key)
-Obtain information from a memcached that stores a specific key. If the key does not exist, an error is returned.
set(key, value [, expiry])
-store This specific value using the identity keys within the cache. If the key already exists, it will be updated. The expiry time is in seconds, and if the value is less than 30 days (30*24*60*60), then it is used as the relative time, and if the value is greater than 30 days, it is used as the absolute time (epoch).
add(key, value [, expiry])
-If the key does not exist, add the key to the cache and return an error if the key already exists. This function is useful if you want to explicitly add a new key without updating it because it already exists.
replace(key, value [, expiry])
-Updates the value of this particular key and returns an error if the key does not exist.
delete(key [, time])
-Remove this key/value pair from the cache. If you provide a time, then adding a new value with this key will be blocked for that particular period. Timeouts allow you to ensure that this value is always re-read from your datacenter.
incr(key [, value]
)-Add 1 or a specific value for a specific key. Applies only to numeric values.
decr(key [, value])
-minus 1 or a specific value for a specific key, only applies to numeric values.
flush_all
-Make all current entries in the cache invalid (or expire).
For example, within Perl, the basic set operation can be handled as shown in Listing 1.
Listing 1. Basic set operations within Perl
Use cache::memcached;my $cache = new Cache::memcached { ' servers ' = = [ ' localhost:11211 ', ], };$ Cache->set (' MyKey ', ' myvalue ');
The same basic operations within Ruby are shown in Listing 2.
Listing 2. Basic set operations within Ruby
Require ' memcache ' MEMC = Memcache::new ' 192.168.0.100:11211 ' memc["mykey"] = "myvalue"
In two examples, you can see the same basic structure: Set the memcached server, and then assign or set the value. Other interfaces are also available, including those that are appropriate for Java technology, allowing you to use memcached within a WebSphere application. The Memcached interface class allows you to serialize Java objects directly to memcached for easy storage and loading of complex structures. When deploying in an environment like WebSphere, there are two things that are important: the resiliency of the service (what to do when memcached is not available) and how to increase the amount of cache storage to improve the use of multiple application servers or the use of a WebSphere EXtreme scale The performance of such an environment. Let's take a look at both of these questions next.
Back to top of page
Resiliency and Availability
One of the most common questions about memcached is: "What happens if the cache is not available?" As stated in the previous section, the information in the cache should not be the only resource for the information. You must be able to load data stored in the cache from other locations.
Although the inability to access information from the cache slows the performance of the application, it should not prevent the application from running. There are several scenarios that may occur:
- If the memcached service is down, the application should fall back to the formatting required to load information from the original data source and display the information. This application should also continue to attempt to download and store information on memcached Nega.
- Once the memcached server is available, the application should automatically attempt to store the data. There is no need to force overloading of cached data, and you can use standard access to load and populate the cache with information. Eventually, the cache will be re-populated with the most commonly used data.
Again, memcached is a cache of information but not the only source of data. memcached server unavailability should not be the end of the application, although this means that performance will be degraded before the memcached server returns to normal. In fact, the memcached server is relatively simple, and although it is not absolutely fault-free, its simplicity results in that it rarely goes wrong.
Back to top of page
Allocation cache
The memcached server is just a cache for some key store values on the network. If you have more than one machine, you will naturally want to set up an instance of memcached on all the redundant machines to provide a large networked RAM cache storage.
With this idea, there is also a need to use some sort of allocation or replication mechanism to copy key/value pairs between machines. The problem with this approach is that if you do this, you will reduce the available RAM cache instead of increasing it. As shown in 6, you can see that there are three application servers, each of which can access a memcached instance.
Figure 6. Incorrect use of multiple memcached instances
Although each memcached instance is 1 GB in size (resulting in a 3 GB RAM cache), if each application server has its own cache (or if there is data replication between memcached), then the entire installation can still have only 1 GB of cache replicated between each instance 。
Because memcached provides information through a network interface, a single client can access data from any of the memcached instances it can access. If the data is not replicated across each instance, then eventually on each application server, you can have 3 GB of RAM cache available, as shown in 7.
Figure 7. Correct use of multiple memcached instances
The problem with this approach is choosing which server to store the key/value pairs, and how to decide which memcached server to talk to when you want to regain a value. The solution to the problem is to ignore complex things, such as looking up a table, or looking at a memcached server to handle the process for you. The memcached client, however, must strive to be simple.
The memcached client does not have to decide this information, it simply uses a simple hashing algorithm for the key specified when storing the information. When you want to store or get information from a list of memcached servers, the memcached client obtains a value from this key using a consistent hashing algorithm. For example, the key mykey
is converted to a numeric value 23875
. Whether to save or get information does not matter, this key will always be used as a unique identifier to load from the memcached server, so in this case, the "MyKey" hash conversion corresponds to the value always 23875
.
If there are two servers, then the memcached client will perform a simple operation (for example, a coefficient) on this value to determine whether it should store the values on the first or second configured memcached instance.
When a value is stored, the client has the opportunity to determine the hash value from this key and on which server it was originally stored. When a value is obtained, the client determines the same hash value from this key and selects the same server to obtain the information.
If you are using the same server list (and in the same order) on each application server, each application server will select the same server when you need to save or retrieve the same key. Now, in this example, there is 3GB of memcached space that can be shared instead of the same 1 GB of space for replication, which leads to more available caches and is likely to improve the performance of applications with multiple users.
This process also has its complexities (such as what happens when a server is unavailable), for more information, see related documentation (see Resources).
Back to top of page
How can I not use memcached
Although memcached is simple, memcached instances can sometimes be used incorrectly.
Memcached is not a database
The most common misuse of memcached is to use it as a data store, rather than as a cache. The primary purpose of memcached is to speed up the response time of the data, otherwise the data will take a long time to build or recover from other data sources. A typical example is recovering information from a database, especially if the information needs to be formatted or processed before it is displayed to the user. Memcached is designed to store information in memory to avoid repeating the same tasks every time the data needs to be restored.
You must not use memcached as the only source of information needed to run the application, and data should always be available from other sources of information. Also, remember that memcached is just a key/value store. The query cannot be executed on the data, or the content can be iterated to extract information. It should be used to store data blocks or objects for bulk use.
Do not cache database rows or files
Although you can use memcached storage to load data rows from a database, this is actually a query cache, and most databases provide a mechanism for their own query caching. The same is the case with other objects, such as the file system's image or file. Many applications and Web servers already have some good solutions for this kind of work.
If you use it to store all information blocks after loading and formatting, you can get more utility and performance improvements from memcached. Still, as an example of our blog site, the best place to store information is to format the blog category as an object, even after formatting it into HTML. The construction of the blog page can be done by loading individual components from memcached (such as blog post, category list, post history, etc.) and writing the completed HTML back to the client.
Memcached is not safe.
To ensure optimal performance, memcached does not provide any form of security, no authentication, and no encryption. This means that access to the memcached server should be handled as follows: first, by placing them on the same private side of the application deployment environment, and if security is required, use Unix®socket and allow only applications on the current host to access this memcached server.
This sacrifices some flexibility and resiliency, as well as the ability to share RAM caches across multiple machines on the network, but this is the only one by one solutions to ensure memcached data security in the current situation.
Back to top of page
Don't limit Yourself
In addition to situations where memcached instances should not be used, the flexibility of memcached should not be overlooked. Because memcached is at the same schema level as the application, it is easy to integrate and connect to it. And it's not complicated to change the application to take advantage of memcached. In addition, because memcached is just a cache, it does not stop the execution of the application when a problem occurs. If used correctly, it does this by reducing the load on the rest of the server infrastructure (reducing read operations to databases and data sources), which means that more clients can be supported without additional hardware.
But keep in mind that it's just a cache!
Back to top of page
Conclusion
In this article, we learned about memcached and how best to use it. We see how information is stored, how to choose a reasonable key, and how to choose which information to store. We also discussed some of the key deployment issues for all memcached users, including the use of multiple servers, what to do when the memcached instance dies, and, perhaps most importantly, the circumstances in which the memcached cannot be used.
As an open-source application and a simple and straightforward application, memcached's functionality and practicality come from this simplicity. Memcached can be integrated into a wide variety of installations and environments by providing huge amount of RAM storage space for information, making it available on the network, and then allowing it to be accessed through a variety of interfaces and languages.
Reference Learning
- The MySQL memcached documentation provides a lot of information about how to use memcached within a typical database deployment environment.
- Learn about IBM's business caching solution soliddb® by experiencing the IBM SolidDB product family.
- Stay tuned for DeveloperWorks technical activities and webcasts.
- Check out recent seminars, trade shows, webcasts and other events for IBM open source developers that will be held globally.
- Access the DeveloperWorks Open source zone for rich how-to information, tools and project updates, and the most popular articles and tutorials to help you develop with open source technology and use them with IBM products.
- View the free DeveloperWorks on demand demo to watch and learn about IBM and open source technology and product features.
Access to products and technologies
- Memcached.org provides information about memcached and how to download and install it.
- Cache::memcached interface for Perl provides a wide range of interfaces.
- For Java technology, you can use the Com.danga.MemCached class, which provides some additional failover and multi-instance scaling.
- Use the IBM Product evaluation trial software to improve your next development project, which is available for download.
- Download the IBM Product Evaluation trial software or IBM SOA Sandbox for people and start using application development tools and middleware products from db2®, lotus®, Rational®, Tivoli®, and WebSphere®.
Discuss
- Participate in the DeveloperWorks blog to join the DeveloperWorks community.
- Welcome to the My developerWorks Chinese community.
Apply memcached to improve site performance