Java client optimization process of memcached

Source: Internet
Author: User
Document directory
  • Stage 1: Package whalin
  • Stage 2: Optimization

What is memcached?

Memcached is a universal cache that supports horizontal scaling. It is necessary to explain here that many developers feel that memcached is a hierarchical cache system, but the memcached service end itself is a single instance, only in the implementation process of the client, you can perform partition storage based on the primary key of the storage, and this zone is the memcached service is really a lot of instances, if the client is swept into memcached, the Department's point of view can be said to be a combination. Looking back at the converged architecture, there are no more than two environments: one is the balanced node mesh (JBoss tree cache), and The jgroup multicast communication mechanism is manipulated to synchronize data; second, the master-slaves mode (diffuse File System), the master to handle slave, for example, how to choose slave, how to migrate data, etc. are completed by the master, however, the master node also has a single point of failure. Next we will summarize several of its characteristics to understand its strengths and limits.

Memory storage: The speed is fast, but the memory requirements are high. This kind of environment has very low requirements on CPU, so it is often accepted that memcached service end and some CPU high-consumption memory, low-consumption applications are displayed in one way. (A product of ours happens to have such a situation. Our interface server has multiple servers, which have high CPU requirements. The reason is that WS-Security is used, but the memory requirements are very low, is to be used as a memcached service terminal equipment ).

Cache): Avoid the spread of derivative cache questions, but it is necessary not to single point to ensure its reliability, this is the back of the integration of the cluster (cluster) thing, multiple memcached instances can be used as a hypothetical cluster. At the same time, the read/write performance of the cluster is different from that of the popular memcached instance.

Diffuse Scaling: One of the highlights of memcached is that it accepts a scalable model. You can configure multiple memcached service terminals on one machine to form a hypothetical service end on multiple machines, it is completely barrier and transparent to the attacker. This not only improves the Memory Manipulation rate of a single machine, but also provides a scale-up (scale out) system.

Socket communication:Here, we need to pay attention to the huge details of transmission content and serialization issues. Although memcached will be arranged in the Intranet as a cache, the socket transmission speed should be high (currently, the TCP and UDP modes are supported, at the same time, you can choose to use the NIO synchronization method based on the customer's real differences), but the serialization cost and bandwidth cost are still necessary to pay attention. Serialization is also mentioned here to deal with the serialization function of the tool. Every time the master has a headache, but if the class tool of the unified class is used for serialization and transmission, the first serialization process will be longer than the current one and will be optimized in the future, that is to say, the biggest consumption of serialization is not tool serialization, but class serialization. If we wear a character string, this environment is the most ambitious, saving the serialization of the operation, is to keep in memcached every small content.

Special memory allocation mechanism:The largest storage tool supported by memcached is 1 MB. Its memory allocation ratio is special, but such allocation system is based on the function of thinking, a simple allocation mechanism can be more easily accepted to take over and dispatch, throttle the use of CPU. Here, we use a wine cellar to recently declare this memory allocation mechanism. First of all, when memcached is started, we can set all the memory used through the process parameters-the wine cellar, And then when there is wine entering, the first application (which is 1 m) space, used to build a wine shelf, and the wine shelf according to the details of the wine bottle will be divided into multiple small grids to settle the wine bottle, the wine bottles in the same fine scale are arranged on a class of wine racks. For example, a 20 cm-radius wine bottle is arranged on a wine shelf a that can accommodate 20-25 cm, and a 30 cm-radius wine bottle is arranged on B that can accommodate 25-30 cm. The receiving and taking over mechanism is also very simple. First, the new wine is put into the database to check whether there is any place for receiving and taking over the wine shelf. If there is any place, it will be used directly. If not, it will apply for a new place. If not, it accepts the outdated strategy of setting up equipment and decoration. From this point of view, if the content to be put is very separated, and the ratio difference and gradient of the size are very large, the results will be poor in terms of space utilization, it is probably because a bottle of wine is put on the wine Shelf A, but it occupies the location of the wine shelf.

The cache mechanism is simple:By accident, many open-source projects have been widely used, but at the end of the day, the function was dragged down due to the excessive importance of some non-needed functions. Here we mention memcached's simplicity. First of all, it has no synchronization, dynamic and static distribution, two-phase commit, etc. It is a very simple cache, put the tool in, and then you can pull it out, if the key provided by the invention is not shot, it will be reported to you in vain. Your key does not have any corresponding tool in the cache, And you can retrieve it from other places in the database; when you get the content from the external data source, you can directly put the content into the cache, so that you can hit it next time. Here are two ways to synchronize the data: one is to update the cache content immediately after you click it. This will take effect immediately; the other is to say that when a request is allowed to expire, the content will be deleted naturally when it expires. At this time, the content will not be shot when it is retrieved, and then the content will be placed into the cache again, used to update content. The latter is used in environments with low timeliness requirements and frequent writes.

The customer is really important:Memcached is a service end written in C, the client is not delineated, It is socket transmission, as long as the talk to support socket communication, through the process of command simple peace talk can communicate. However, the fairness of the client plan is very important. At the same time, it also provides a lot of space for the caller to expand and plan the client to satisfy all kinds of scenarios, bao Luo fault tolerance, website index, obedience, special efficacy needs and embedded framework and so on.

Several application points:Cache of gadgets (user's token, permission information, and capital information); small static capital cache; cache of SQL results (if this department is used well, the performance will be greatly improved, at the same time, because memcached itself provides upward resizing, it is undoubtedly a good choice to deal with the problem of database upward resizing); ESB Dynamic and Static cache.

The reason why the Java client optimized the memcached System

Memcached is widely used on large websites, and clients that talk differently are also available on official websites. However, there are few choices for Java developers. Because the memcached service end is written in C at the moment, it is impossible for me to optimize it because I am not familiar with C. Although the details such as the memory allocation mechanism to deal with it are still some understanding, it is to pay great attention to the use of time, these articles have a lot of collection. Here I will focus on the two phases of Java client optimization for the memcache system.

Stage 1: Package whalin

The first stage is to fundamentally re-encapsulate whalin, one of the official Java clients.

  1. Interface-based caching: Defines the imemcache interface, in the Application Department only use the interface, for the future replacement of cache service to achieve the supply of fundamental.
  2. Use the configuration item to replace the code to initialize the client.: Through the process of setting the client and socketio pool properties, directly handed over to cachemanager to maintain the cache client pool lifecycle, easy to unit test.
  3. Keyset implementation: To Deal With memcached, we do not provide keyset. In the early stage of interface encapsulation, when my colleagues raised this demand to me, I felt that there was no need to provide, because cache polling is less efficient than the force, In such scenarios, you can obtain keyset from the data source instead of memcached. However, the presentation of a sip scenario makes it necessary for me to implement the keyset.
    In the period when the service frequency is controlled, the SIP must note the number of calls and traffic during the period. At this time, because it is a cluster, the data must be stored in a converged storage approximate cache, the database must be unable to support the update frequency of such a large amount of data. It is a good way to use memcached-Global counters (storecounter, getcounter, Inc, DEC ), but how can I obtain all the current counters when I check the counter? I once thought about using DB rough files, but I had to obey the questions. If I put them in a field, there would be concurrent questions. The keyset has to be implemented. There is a parameter in the time when the keyset is used. The example is boolean. This field exists because the deletion of data in memcached is not a direct deletion, but a note is given, this will lead to the implementation of the keyset hour to pull out the data that has probably been deleted. If the strict data requirements are low and the speed requirements are high, it is not necessary to verify whether the key is really useful. If the key must exist accurately, it is necessary to perform multiple round-robin queries.
  4. Implementation of the cluster: memcached, as a union cache, has a single point of failure (spof. Although memcached holds multiple instances on multiple machines, it only deals with the full data loss problem. When one machine makes a mistake, it will still lead to data loss in the department, if a basket falls onto the ground, it will still break through the Department's eggs. It is necessary to implement a backup mechanism, which may guarantee that memcached will become invalid in the Department. In the future, the data may still be used. Although many masters use the cache to obtain the data source without hitting it. However, in the SIP scenario, if the Department information cannot be found, the database will be searched for, and the SIP will be easily broken down. The data in memcached is handled by the SIP to be reliable, cluster creation is also required.
  5. Localcache is used together with memcached to improve data acquisition and obedience: During the first stress test, it was found that memcached was not completely lost, as originally expected, memcached is through the socket data interaction to hold the communication, is to mechanical bandwidth, collection of Io, socket connection number is restricted memcached play its role of the stagnation. One of the highlights of memcache is the setting of timeout, which can be used to set a useful period for the data to be put in, so as not to update the insensitive data within the tolerable period, obey by progress. According to this idea, every memcached client in the cluster can also use the local cache to store the obtained data and set the inevitable failure to reduce the number of waiting times for memcached, improve group functions.

Therefore, each client has a built-in local cache with a super opportunity (accepting the lazy timeout mechanism). When obtaining data, first ask whether the data exists locally, if it does not exist, I would like to request memcache again. In the future, I will cache the data locally and set the useful time. The method is defined as follows:

/**

* The function loss caused by frequent low memcache interaction is to accept local cache connections to memcache.

* @ Param key

* @ Param: Unit seconds when the local cache becomes invalid

* @ Return

**/

public Object get(String key,int localTTL);
Stage 2: Optimization

The first stage of encapsulation can satisfy the existing needs and be used by projects and other product lines. However, a casual sentence leads me to optimization in the second stage. Some colleagues reported that the memcached client has too many synchronized (synchronous) syntaxes in the socketio code, which may affect the performance. Although I have seen the code of this Department in the past, but at that time, it was just a concern for the hash algorithm. According to my colleague's suggestion, there are a lot of public syncs, probably because the author wrote about the old JDK version when the customer was really busy. At this moment, concurrent is widely used, optimization is not a very difficult task. However, because the original whalin does not provide an interface for expansion, it is necessary to include all the other interfaces except sockio in the whalin into the encapsulated customer's real assumption, and then reform the sockio department.

The result is the open-source client on Google: http://code.google.com/p/memcache-client-forjava /.

  1. Optimization of synchronized: In the original code, the capital pool of sockio is divided into three pools (implemented by popular map): free, busy, and dead ), then, the three capital pools are maintained according to the sockio utilization environment. The optimization method is to simplify the capital pool. There is only one capital pool, and a condition pool is set up. Only the content of the capital pool is changed during the course of capital change. Then, use concurrentmap to replace map, and use putifabsent to simplify synchronized. For detailed code, see the source file of the software on Google.
  2. I thought that after this optimization, obedience should have made great progress. However, after the initial stress test, there was not much progress, it seems that the time consumption of other places is far greater than that of the Capital Maintenance of the adjacent pool. It is to use jprofiler for the function analysis, the biggest bottleneck of the invention: Read data department. In the original plan, data is read based on a single byte, and then analyzed slowly, in order to recognize the points in the peace talks. However, the single-byte and batch paging read functions of cyclic read are quite different, because I have built a built-in cache page (which can be set to very fine), and then read and parse the data according to the needs of peace talks, performance obedience has made great progress. Detailed data shows the stress test results of the last department.

The above two departments do not discuss whether the promotion of the function, but to deal with the client itself is meaningless, although the promotion function to bring more attractive to the application. For details about this department, refer to the Code Implementation Department. There is no function impact on the attacker, but it is just a function.

Stress Testing Ratio

Before this stress test, I have already done many stress tests. The data in the test has not weighed the significance of memcached. Because the test uses my own machine, in this process, functions, bandwidth, memory, and collection Io are not at the server level. In addition, we only use the original third-party client and the innovative client for a comparison. The scenario is to imitate the results of multi-user multi-thread operations that advocate cache operations in a unified manner.

There are two Client versions in the test: 2.0 and 2.2. 2.0 is the client implementation that encapsulates and misappropriates whalin memcached client 2.0.1. 2.2 is implemented using a client that no third party relies on the new sockio. Checkalive refers to whether to verify the usefulness of the connected capital by using the Connected Capital in the past (send a request and take over the corresponding). It is to enable this setting to deal with the function that will have a lot of impact, however, the launch still uses this copy.

Various configurations and configurations of a single cache server instance:

Cache settings User Manipulation Client Version Total time consumed (MS) Time consumed by a single thread (MS) Progress handling CAPABILITY PERCENTAGE
Checkalive 100 1000 put simple OBJ
1000 get simple OBJ
2.0
2.2
13242565
7772767
132425
77727
+ 41.3%
No checkalive 100 1000 put simple OBJ
1000 put simple OBJ
2.0
2.2
7200285
4667239
72002
46672
+ 35.2%
Checkalive 100 1000 put simple OBJ
2000 get simple OBJ
2.0
2.2
20385457
11494383
203854
114943
+ 43.6%
No checkalive 100 1000 put simple OBJ
2000 get simple OBJ
2.0
2.2
11259185
7256594
112591
72565
+ 35.6%
Checkalive 100 1000 put complex OBJ
1000 get complex OBJ
2.0
2.2
15004906
9501571
150049
95015
+ 36.7%
No checkalive 100 1000 put complex OBJ
1000 put complex OBJ
2.0
2.2
9022578
6775981
90225
67759
+ 24.9%

From the stress test, we can see that such a few points, the first Optimization of sockio has been promoted to a lot of functions, followed by the optimization of sockio is the function of get, to deal with put is not too big effect. I thought that the larger the data, the higher the performance, the higher the promotion, but the result is not so.

Test ratio of a single cache instance and a dual-Cache instance:

Cache settings User Manipulation Client Version Total time consumed (MS) Time consumed by a single thread (MS) Progress handling CAPABILITY PERCENTAGE
One cache instance
Checkalive
100 1000 put simple OBJ
1000 get simple OBJ
2.0
2.2
13242565
7772767
132425
77727
+ 41.3%
Two cache instance
Checkalive
100 1000 put simple OBJ
1000 put simple OBJ
2.0
2.2
13596841
7696684
135968
76966
+ 43.4%

Results: the performance of a single client corresponds to multiple service end instances, which is slightly higher than that of a single client.

Test ratio of the cache cluster:

Cache settings User Manipulation Client Version Total time consumed (MS) Time consumed by a single thread (MS) Progress handling CAPABILITY PERCENTAGE
No Cluster
Checkalive
100 1000 put simple OBJ
1000 get simple OBJ
2.0
2.2
13242565
7772767
132425
77727
+ 41.3%
Cluster
Checkalive
100 1000 put simple OBJ
1000 put simple OBJ
2.0
2.2
25044268
8404606
250442
84046
+ 66.5%

This department has nothing to do with socketio optimization. 2.0 accept is to all the clients in the cluster update lecheng will be returned in the future strategy, 2.2 accept the asynchronous update, and is a derivative client node to obtain the style to separate the pressure, is to be promoted to obey a lot.

Open source code download

The encapsulated client has always been used internally. After secondary optimization, I feel that it should be open-source. First, the client code can be perfect, second, you can also exchange experiences with more developers. At present, I have uploaded the application code, model, and statement in Google Code. If you have fun, you can download it and test it, with the Java memcached client used at the moment, whether it has made progress in terms of ease-of-use and function, and waiting for more feedback on the open-source content of this department, it may be better to do it.

 

Http://www.doudou8.net/post/cid ~ 412

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.