Before this function is not concerned, the Memcache data storage method has been used, but since the replacement of Redis, for a hash of data storage and access for Memcache is very convenient, but the problem is, a hash of the list if the amount is not large, With the Hgetall function almost no problem, once the list is more than 50 or more, at this time with the Hgetall function can be very intuitive to see performance problems, there is no data analysis.
Redis is a single thread! When it processes a request, the other requests can only wait. Usually the request will be processed quickly, but when we use Hgetall, we have to traverse each field to get the data, which consumes the CPU resources and the number of fields in direct proportion, if also used pipelining, is undoubtedly worse.
Copy Code code as follows:
Performance = Cpus/operations
In other words, in this scenario, in order to improve performance, either increase the number of CPUs during the operation, or reduce the number of operations in the operation. In order to continue to use the data of the hash structure, but also to solve this problem, the more convenient way is to serialize the hash string storage, take out the deserialized data, and then use Hget (Key,array (hash ...)).
For example:
Copy Code code as follows:
....
$arrKey = Array (' dbfba184bef630526a75f2cd073a6098 ', ' dbfba184bef630526a75f2cd0dswet98 ')
$strKey = ' Test ';
$obj->hget ($strKey, $arrKey);
Simplify the original Hgetall operation to Hget, that is, there is no longer a need to traverse every field in the hash, so even if multiple CPUs are not involved in the operation, the number of operations is significantly reduced, so performance improvements are still significant; Of course, the disadvantage is obvious, as with all redundancy, This scenario wastes a lot of memory.
Some people will ask, this is not the process of traversing the field, but it increases the process of deserialization, and the cost of deserialization is often very high, it can also improve performance? The crux of the problem is that the beginning of the operation of our traversal of the field is done on a CPU, and then deserialized operations, no matter what language, can be multiple processes or multithreading to ensure that the completion of multiple CPUs, so performance is generally promoted.
In addition, many people are intuitive to solve problems by running Redis multiple instances. Indeed, this can increase the number of CPUs in the operation and help improve performance, but it should be noted that Hgetall and pipelining tend to make the number of operations in the operation of the geometric level of explosion growth, in contrast, we can increase the number of Redis multiple instances is simply a drop in the bucket, So this method in this case does not solve the problem completely.
Remember Redis that deceptive hgetall.
There is no pit in the world, and many people fall, and they become pits.
Already heard people said Redis Hgetall is a pit, but I did not superstitious: no matter what pits, must be stamped on their feet to be willing to give up. Good to listen to this is Buthe, said the harsh point is Bushi.
Start the program running very stable and stable to I want to send all said Hgetall is a pit man a word: Bah! At this time, I like the frog in the warm water to forget the danger of existence, the time of day in the past, suddenly one day, the demand has changed, I have to the hash data from more than 10 fields to expand to more than 100 fields, while using pipelining one-time access to hundreds of hgetall results. So I fell out of the pit: server downtime.
Why is that? Redis is a single thread! When it processes a request, the other requests can only wait. Usually the request will be processed quickly, but when we use Hgetall, we have to traverse each field to get the data, which consumes the CPU resources and the number of fields in direct proportion, if also used pipelining, is undoubtedly worse.
How to solve this problem? Please allow me to do something. Give a formula:
Copy Code code as follows:
Performance = Cpus/operations
In other words, in this scenario, in order to improve performance, either increase the number of CPUs during the operation, or reduce the number of operations in the operation. Specifically, I have come up with a few ways of thinking:
With the help of memcached
Redis storage mode does not make any changes, the additional, we use the memcached to implement a set of cache, which stores the original need in Redis Hgetall hash, of course, because the memcached store is a string, so when we store hash, In fact, the hash-serialized string is stored, and the query is deserialized, usually memcached the client-side driver can transparently implement serialization and deserialization. The advantage of this scenario is that because the memcached supports multiple threads, so you can allow more CPU to participate in the operation, and because you do not have to traverse each field, so the corresponding operation will be reduced, of course, a lot of disadvantages, because the introduction of a new cache layer, so wasted memory, increased complexity, in addition, Sometimes even if we just need to get a few fields of data, we have to first query the full data, and then filter, which undoubtedly wasted bandwidth. Of course, we can query Redis directly in this case, but it certainly increases some complexity.
By the way, memcached support Multiget, can achieve similar pipelining effect, but you have to be extra careful in this inside about memcached pit, that is, mulitiget bottomless pit problem.
Serialization of Field redundancy
Redis in the storage of hash, save a field named "all", its content is the original hash data serialization, the actual query, as long as the hget of this redundant field and then deserialize. The advantage of this scenario is that by serializing the field redundancy, we simplify the original Hgetall operation to Hget, that is, you no longer need to traverse every field in the hash, so even if multiple CPUs are not involved in the operation, the number of operations is significantly reduced, so performance improvements are still significant Of course, the disadvantage is also obvious, as with all redundant methods, this scheme wastes a lot of memory.
Some people will ask, this is not the process of traversing the field, but it increases the process of deserialization, and the cost of deserialization is often very high, it can also improve performance? The crux of the problem is that the beginning of the operation of our traversal of the field is done on a CPU, and then deserialized operations, no matter what language, can be multiple processes or multithreading to ensure that the completion of multiple CPUs, so performance is generally promoted.
...
In addition, many people are intuitive to solve problems by running Redis multiple instances. Indeed, this can increase the number of CPUs in the operation and help improve performance, but it should be noted that Hgetall and pipelining tend to make the number of operations in the operation of the geometric level of explosion growth, in contrast, we can increase the number of Redis multiple instances is simply a drop in the bucket, So this method in this case does not solve the problem completely.
...
Pit, which is used for stepping. Don't be afraid to fall in, of course, if you can crawl out of your own!