Document directory
- Stage 1: Package whalin
- Stage 2: Optimization
What is memcached?
Memcached is a centralized cache that supports distributed horizontal scaling. Many developers think that memcached is a distributed cache system, but the memcached server itself is a single instance, in the implementation process of the client, you can store partitions Based on the stored primary key. This zone is one or more instances of the memcached server. If you include the client in memcached, it can be partially centralized. In fact, a review of the centralized architecture involves two situations: one is the balanced node mesh (JBoss tree cache), which uses the multicast communication mechanism of jgroup to synchronize data; the second is the master-slaves mode (Distributed File System), which is used by the master to manage slave. For example, how to select slave and how to migrate data are all completed by the master, however, the master node also has a single point of failure. Next we will summarize several of its features to understand its advantages and limitations.
Memory storage: It is self-evident that the speed is fast, but the memory requirements are high. In this case, CPU requirements are very low, so memcached servers and some applications with high CPU consumption and low consumption are often deployed together. (One of our products has such an environment, and we have multiple interface servers, which have high CPU requirements. The reason is that WS-Security is used, but memory requirements are low, therefore, it can be used as a server for memcached deployment ).
Centralized Cache): This avoids the spread of distributed cache, but requires non-single points to ensure its reliability. This is the cluster work in the later integration, multiple memcached instances can be used as a virtual cluster. The read/write performance of the cluster is no different from that of the common memcached instances.
Distributed Scaling: Memcached adopts a distributed extension mode. A virtual server can be composed of multiple memcached servers deployed on one machine or memcached servers deployed on multiple machines. This is completely shielded and transparent for callers. This not only improves the memory usage of a single machine, but also provides a scale-out mode.
Socket communication:Pay attention to the size and serialization of the transmitted content. Although memcached is usually placed in the Intranet as the cache, the socket transmission rate should be relatively high (TCP and UDP modes are currently supported, you can also choose to use the NIO synchronous or asynchronous call method based on different clients), but you still need to pay attention to the serialization cost and bandwidth cost. Serialization is also mentioned here. The performance of Object serialization is often a headache for everyone. However, if the serialization and transmission of class objects of the same class are long for the first time, it will be optimized in the future, that is to say, the biggest consumption of serialization is not object serialization, but class serialization. If we wear only strings in the past, this situation is the most ideal, saving serialization operations. Therefore, memcached often saves a small amount of content.
Special memory allocation mechanism:Note that memcached supports a maximum storage object of 1 MB. Its memory allocation is quite special, but this allocation method is also based on performance considerations. A simple allocation mechanism can be easier to recycle and reallocate, saving CPU usage. Here, we use a wine cellar to illustrate this memory allocation mechanism. First, when memcached is started, we can use parameters to set all the memory used-the wine cellar. Then, when there is wine entering, the first application (usually 1 m) space is used to build a wine shelf. The wine shelf divides itself into several small grids based on the size of the wine bottle to hold the wine bottle, the wine bottles within the same size range are placed on a type of wine shelf. For example, a 20 cm-radius wine bottle is placed on a wine shelf a that can hold 20-25 cm, and a 30 cm-radius wine bottle is placed on B that can hold 25-30 cm. The recycling mechanism is also very simple. First, check whether there are items available for recycling in the wine shelf. If there are items available, use them directly. If not, apply for a new location. If not, the configured expiration policy is used. From this perspective, if the size of the content to be placed is discrete and the size/proportion difference gradient is obvious, the effect may be poor for space use, it is very likely that a bottle of wine is put on the wine Shelf A, but it occupies the location of the wine shelf.
The cache mechanism is simple:Sometimes many open-source projects cover all aspects, but in the end, performance is dragged down by focusing too much on unnecessary functions. Here we mention the simplicity of memcached. First, it has no synchronization, message distribution, two-phase commit, etc. It is a very simple cache, put something in, and then it can be retrieved. If the provided key is not hit, it is very straightforward to tell you that your key does not have anything in the cache, And you can retrieve it from the database or other places. When you get it from an external data source, you can directly place the content in the cache so that it can be hit next time. Here we will introduce two ways to synchronize the data: one is to update the cache content immediately after you modify it, so that it will take effect immediately; the other is to allow a failure time, when the expiration time is reached, the content will naturally be deleted and won't hit when it is retrieved again. Then, the content will be placed in the cache again to update the content. The latter is used in scenarios where real-time requirements are not high and Data Writing is not frequent.
Client importance:Memcached is a server written in C. The client does not stipulate that it is socket transmission. As long as the language supports socket communication, memcached can communicate through the simple command protocol. However, the rational design of the client is very important, and it also provides users with a lot of space to expand and design the client to meet the needs of various scenarios, including fault tolerance, weight, efficiency, special functional requirements, and embedded frameworks.
Several application points:Cache of small objects (users' tokens, permission information, and resource information); cache of small static resources; cache of SQL results (if this part is used well, the performance will be greatly improved, at the same time, because memcached itself provides upward resizing, it is undoubtedly a good solution to the old difficulty of database upward resizing); ESB message cache.
Why memcached system Java client is optimized
Memcached is widely used on large websites. Clients of different languages are also available on official websites, but there are not many Java developers. Since the current memcached server is written in C, I am not familiar with this c, there is no way to optimize it. Of course, I still have some knowledge about its memory allocation mechanism and other details. Therefore, I will pay attention to these articles on the Internet. Here I will focus on the two phases of Java client optimization for the memcache system.
Stage 1: Package whalin
The first stage is to re-encapsulate the whalin Open Source implementation, one of the officially recommended Java clients.
- Interface-based Cache service: The imemcache interface is defined. The application only uses the interface to provide the basis for future implementation of the cache replacement service.
- Use configuration instead of code to initialize the client: By configuring the client and socketio pool attributes, you can directly submit them to cachemanager to maintain the cache client pool lifecycle, facilitating unit testing.
- Keyset implementation: For memcached, The keyset method is not provided. In the early stage of interface encapsulation, when my colleagues raised this requirement to me, I personally felt that it was not necessary to provide it, because cache polling is relatively inefficient. In this type of scenario, you can obtain keyset from the data source instead of memcached. However, the emergence of a sip scenario forces me to implement keyset.
When controlling the Service Access frequency, SIP must record the access times and traffic during the control interval. In this case, because it is a cluster, data must be stored in a centralized storage or cache, the database must not support the update frequency of such a large amount of data. Therefore, consider using memcached's brilliant operations-Global counters (storecounter, getcounter, Inc, DEC ), but how do I obtain all the current counters when checking the counters? I have considered using DB or files, but the efficiency is poor. If I put it in a field, there will be concurrency problems. Therefore, keyset has to be implemented. When keyset is used, there is a parameter whose type is boolean. This field exists because the deletion of data in memcached is not a direct deletion, but an annotation, this will result in the removal of data that may have been deleted when keyset is implemented. If the strict data requirements are low and the speed requirements are high, you do not need to verify whether the key is valid. If the key must exist correctly, you need to perform another round robin query.
- Cluster implementation: memcached, as a centralized cache, has a central critical problem: single point of failure. Although memcached supports the distribution of multiple instances on multiple machines, it only solves the problem of full data loss. When one of the machines fails, some data may be lost, if a basket falls onto the ground, some eggs are broken. Therefore, we need to implement a backup mechanism to ensure that data can still be used after some memcached fails. Of course, most of the time, we use the data source acquisition policy when cache hits are not hit. However, in the SIP scenario, if some information cannot be found, it is easy to get rid of the SIP. Therefore, the SIP is considered credible for data in memcached, cluster creation is also necessary.
- Localcache is used in combination with memcached to improve data acquisition efficiency: During the first stress test, it was found that memcached was not completely Zero loss as originally expected, memcached communicates through socket data interaction. Therefore, the bandwidth, network I/O, and socket connections of the machine are all obstacles to memcached to play its role. One of the major advantages of memcache is the timeout setting, that is, you can set the validity period for the data to be put in, so that you can not update the data that is not sensitive within a certain period of time, to improve efficiency. According to this idea, each memcached client in the cluster can also use a local cache to store the obtained data and set a certain expiration time to reduce the number of accesses to memcached, improve overall performance.
Therefore, each client has a built-in local cache with a timeout mechanism (using the lazy timeout mechanism). When obtaining data, first query whether the data exists locally, if it does not exist, initiate a request to memcache. After obtaining the data, cache it locally and set the validity period. The method is defined as follows:
/**
* The performance loss caused by frequent memcache interaction is reduced. Therefore, the local cache and memcache are used.
* @ Param key
* @ Param local cache expiration time in seconds
* @ Return
*
*/
public Object get(String key,int localTTL);
Stage 2: Optimization
The encapsulation in the first stage can basically meet the existing requirements and be used by my own projects and other product lines. But with a casual sentence, I started the optimization in the second stage. Some colleagues told me that the memcached client's socketio Code contains too many synchronized (synchronization), which may affect performance more or less. Although I have read this part of code in the past, I only focused on the hash algorithm in it. According to my colleague's suggestion, a lot of synchronization may be caused by the older JDK version when I was writing the client. Currently, concurrent is widely used, therefore, optimization is not very difficult. However, since the original whalin does not provide an extended interface, you have to include all the other whalin except sockio in the idea of the encapsulated client, and then transform the sockio part.
The result is the open-source client on Google: http://code.google.com/p/memcache-client-forjava /.
- Optimization of synchronized: In the original code, the sockio resource pool is divided into three pools (Common map implementation): free, busy, and dead ), then, the three resource pools are maintained based on the sockio usage. The optimization method is to simplify the resource pool. There is only one resource pool and one status pool is set. Only the content in the resource pool is changed when the resource status is changed. Then use concurrentmap to replace map, and use the putifabsent method to simplify synchronized. For specific code, see the source file of the software on Google.
- I thought that after this optimization, the efficiency should be greatly improved. However, after the initial stress test, it was found that the efficiency was not significantly improved. It seems that the time consumption in other places is far greater than the resource maintenance in the connection pool, therefore, jprofiler is used for performance analysis and the biggest bottleneck is found: Read data. In the original design, data is read in a single byte, and then analyzed step by step, in order to only recognize the separators in the Protocol. However, the performance of single-byte cyclic read differs greatly from that of batch paging read. Therefore, I have built a built-in cache page (size can be set) and then read and analyze data according to protocol requirements, the result display efficiency is greatly improved. For detailed data, see the stress test results in the last section.
The above two parts of the work, whether or not to improve the performance, but for the client itself is meaningful, of course, to improve the performance of the application to bring greater appeal. For details about this part, refer to the code implementation section. It does not affect the performance of the caller.
Stress test comparison
Before this stress test, I have actually performed many stress tests. The data in the test is not measured in memcached itself, because the test uses my own machine, performance, bandwidth, memory, and network I/O are not at the server level. Here we only compare the use of the original third-party client with the transformed client. The scenario is to simulate multi-user multithreading to initiate a cache operation at the same time, and then record the operation results.
There are two Client versions in the test: 2.0 and 2.2. 2.0 is the client implementation that encapsulates and calls whalin memcached client 2.0.1. 2.2 is implemented using a client without third-party dependency of the new sockio. Checkalive indicates whether to verify that the connection resource is valid (send a request and receive a response) before using the connection resource. Therefore, Enabling this setting has a significant impact on the performance, however, we recommend that you use this check.
Comparison of various configurations and operations on a single cache server instance:
Cache Configuration |
User |
Operation |
Client Version |
Total time consumed (MS) |
Time consumed by a single thread (MS) |
Increase processing capability percentage |
Checkalive |
100 |
1000 put simple OBJ 1000 get simple OBJ |
2.0 2.2 |
13242565 7772767 |
132425 77727 |
+ 41.3% |
No checkalive |
100 |
1000 put simple OBJ 1000 put simple OBJ |
2.0 2.2 |
7200285 4667239 |
72002 46672 |
+ 35.2% |
Checkalive |
100 |
1000 put simple OBJ 2000 get simple OBJ |
2.0 2.2 |
20385457 11494383 |
203854 114943 |
+ 43.6% |
No checkalive |
100 |
1000 put simple OBJ 2000 get simple OBJ |
2.0 2.2 |
11259185 7256594 |
112591 72565 |
+ 35.6% |
Checkalive |
100 |
1000 put complex OBJ 1000 get complex OBJ |
2.0 2.2 |
15004906 9501571 |
150049 95015 |
+ 36.7% |
No checkalive |
100 |
1000 put complex OBJ 1000 put complex OBJ |
2.0 2.2 |
9022578 6775981 |
90225 67759 |
+ 24.9% |
From the stress test above, we can see that the first step is to optimize sockio to improve a lot of performance, and the second step is to optimize the get performance, which has little effect on put. The larger the data is, the more obvious the performance is. However, this is not the case.
Test and comparison of a single cache instance and a dual-Cache instance:
Cache Configuration |
User |
Operation |
Client Version |
Total time consumed (MS) |
Time consumed by a single thread (MS) |
Increase processing capability percentage |
One cache instance Checkalive |
100 |
1000 put simple OBJ 1000 get simple OBJ |
2.0 2.2 |
13242565 7772767 |
132425 77727 |
+ 41.3% |
Two cache instance Checkalive |
100 |
1000 put simple OBJ 1000 put simple OBJ |
2.0 2.2 |
13596841 7696684 |
135968 76966 |
+ 43.4% |
The results show that the performance improvement of a single client corresponds to multiple server instances is slightly higher than that of a single server instance corresponding to a single client.
Test and comparison of cache clusters:
Cache Configuration |
User |
Operation |
Client Version |
Total time consumed (MS) |
Time consumed by a single thread (MS) |
Increase processing capability percentage |
No Cluster Checkalive |
100 |
1000 put simple OBJ 1000 get simple OBJ |
2.0 2.2 |
13242565 7772767 |
132425 77727 |
+ 41.3% |
Cluster Checkalive |
100 |
1000 put simple OBJ 1000 put simple OBJ |
2.0 2.2 |
25044268 8404606 |
250442 84046 |
+ 66.5% |
This is not related to socketio optimization. 2.0 adopts the policy that is returned after successful update to all clients in the cluster. 2.2 adopts asynchronous update and is obtained by distributed client nodes to distribute the pressure. Therefore, the efficiency is greatly improved.
Open source code download
In fact, the encapsulated client has been used internally. After secondary optimization, I think it should be open-source. First, I can improve my client code, second, you can also share your experiences with more developers. Currently, I have uploaded the application code, examples, and instructions on Google Code. If you are interested, download and test the code, whether the usability and performance of the currently used Java memcached client have been improved, and more feedback on this part of open-source content is expected to be improved.
Link: http://code.google.com/p/memcache-client-forjava /.