Redis Cache Set usage and redis problems

Source: Internet
Author: User
Tags character set memory usage redis set set redis server

Redis Cache Set usage

In Redis, we can view the Set type as a character Set combination without sorting, which is the same as the List type, you can also add, delete, or determine whether an element exists on the data value of this type. Note that the time complexity of these operations is O (1), that is, the operation is completed within the constant time. The maximum number of elements that a Set can contain is 4294967295.

Unlike the List type, duplicate elements are not allowed in the Set set, which is exactly the same as the Set container in the C ++ standard library. In other words, if the same element is added multiple times, only one copy of the element is retained in Set. Compared with the List type, the Set type also has a very important feature in terms of functionality, that is, completing aggregation computing operations between multiple Sets on the server side, such as unions, intersections, and differences. Because these operations are completed on the server, the efficiency is extremely high, and it also saves a lot of network I/O overhead.

Redis may cache many sets.

Open the redis server:
'''


Open the redis client:


This is a set!

For the redisset command friends can refer to (http://redisdoc.com)

The following describes how to use redis in. net.

1. Obtain the set

// Get all keys in setId in sortset table, get in reverse order
public List <string> GetAllItemsFromSortedSetDesc (string setId)
{
    List <string> result = ExecuteCommand <List <string >> (client =>
    {
        return client.GetAllItemsFromSortedSetDesc (setId);
    });
    return result;
}
public List <string> GetAllItemsFromSortedSet (string setId)
{
    List <string> result = ExecuteCommand <List <string >> (client =>
    {
        return client.GetAllItemsFromSortedSet (setId);
    });
    return result;
}
// Get all the keys and values in setId in the sortset table
public IDictionary <string, double> GetAllWithScoresFromSortedSet (string setId)
{
    IDictionary <string, double> result = ExecuteCommand <IDictionary <string, double >> (client =>
    {
        return client.GetAllWithScoresFromSortedSet (setId);
        // return client.GetFromHash <Dictionary <string, string >> (hashID);
    });
    return result;
}
2, delete a set


// Delete the value of a KEY and return TRUE if successful
public bool RemoveKey (string key)
{
    bool result = false;
    result = ExecuteCommand <bool> (client =>
         {
             return client.Remove (key);
         });
    return result;
}
 
// Delete a value of item in Set data
public bool RemoveItemFromSet (string setId, string item)
{
    byte [] bvalue = System.Text.Encoding.UTF8.GetBytes (item);
    bool result = ExecuteCommand <bool> (client =>
    {
        var rc = client as RedisClient;
        if (rc! = null)
        {
            return rc.SRem (setId, bvalue) == 1;
        }
        return false;
    });
    return result;
}

3. Search


// search key
public List <string> SearchKeys (string pattern)
{
    List <string> result = ExecuteCommand <List <string >> (client =>
    {
        return client.SearchKeys (pattern);
    });
    return result;
}

4. Add an element to the set


public bool AddItemToSet (string setId, string item)
 {
     byte [] bvalue = System.Text.Encoding.UTF8.GetBytes (item);
     bool result = ExecuteCommand <bool> (client =>
     {
         var rc = client as RedisClient;
         if (rc! = null)
         {
             return rc.SAdd (setId, bvalue) == 1;
         }
         return false;
     });
     return result;
 
 }


Only a few methods are shared here, in fact, there are many operations on set.

Using the Sets data structure provided by Redis, you can store some collective data. For example, in the Weibo application, you can store all the followers of a user in a set, and store all its fans in a set. Redis also provides operations such as intersection, union, and difference for collections, which can be very convenient to implement functions such as common attention, common preferences, and second-degree friends. For all the above collection operations, you can also use different command options Whether to return the result to the client or save it to a new collection.



Summary of some problems encountered during the use of redis



tpn (taobao push notification) encountered a series of problems in the process of using redis to calculate the unread message. Let's tidy up this process and let everyone understand this tangled process for everyone to use redis or do similar Function reference

Redis is mainly used in tpn to calculate the unread messages on mobile knives (Android, IOS). The number of unread messages of tpn is based on the bizId dimension, that is, the same bizId (service id of each message, if the product id, order id, etc.), even if there are multiple messages, unread can only be counted as 1. Therefore, in the process of receiving the message and calculating the unread value of the mobile kilonewton, it is necessary to deduplicate the bizId. This deduplication function is implemented through redis. As the amount of messages continues to rise, this redis-based deduplication solution also changes.

1. Unread calculation based on redis Set structure

The biggest feature of tpn unread reading calculation mentioned above is based on bizId deduplication. In java, we can easily think of using HashMap or HashSet to judge the weight. Therefore, initially tpn uses the Redis Set structure to judge the weight. The two commands of the redis set structure are mainly used: SADD and SCARD

SADD key member [member ....]: Add one or more member elements to the collection key. The member elements that already exist in the collection will be ignored. If the key does not exist, create a collection that contains only the member element as a member. If the member element is not in the collection, it returns 1; if the member element already exists in the collection, it returns 0.

SCARD key: Returns the number of elements in the collection key.

With these two commands, the steps to calculate unread reading are like this:



Redis cache Set use and some problems encountered by redis-redis cache
tpn will keep the message for 7 days for the user, which means that the bizId stored in the redis set structure will expire in 7 days, and the user will clear the corresponding redis set after viewing the message (that is, if a user has several consecutive If you do n’t check the news of Qianniu every day, a lot of bizid will be saved in the corresponding redis set). There are a total of 6 redis machines on tpn, and 5 redis instances are deployed on each machine. The maxmemory of each instance is set to 1G, and a total of 30G of memory is used to store the message bizId. In the early days of tpn, due to the small number of users and the amount of messages, the memory of redis can store all the bizId messages within 7 days, so this program works well. However, as most active sellers on the entire network began to use KN, the amount of tpn messages also skyrocketed. More and more messages bizId put great pressure on redis. During the peak message period, tpn logs will There are a large number of redis timeout exceptions (tpn uses jedis, and the configured timeout is 300ms). After analysis, it is mainly caused by the following reasons:

Timeout caused by cache invalidation: As mentioned earlier, the maxmemory of each redis instance of tpn is set to 1G, because the bizId is increasing, so soon the memory of each redis instance exceeds maxmemory. When redis is processing client requests, if it finds that the current memory usage is greater than or equal to maxmemory, it will invalidate the partially expired cache until the memory usage is less than maxmemory. Obviously, the operation of releasing memory by the invalid cache will affect the rt of redis. During the peak message period, the memory usage of redis instances has been hovered by maxmemory, causing redis to continuously invalidate the cache to release memory while responding to a large number of requests, resulting in frequent timeouts.

Because there are too many bizIds, and there is not enough redis memory, the redis request times out a lot. The easiest way is to add machines and deploy more redis instances to store more and more messages bizId. According to a preliminary estimate, to completely store all messages bizId in 7 days in memory, it requires up to hundreds of G of memory: transaction messages and commodity messages are the two main types of tpn messages, because most active sellers on the entire network Thousands of cattle are used. In order to deduplicate, tpn needs to save all new transaction ids and commodity ids in the entire network within 7 days to redis memory. All ids added by ic. It is basically impossible for tpn to apply for so many redis machines. Even if there are so many redis machines, the deployment and maintenance costs are huge. Even without redis, using tair's rdb, this Chen Ben is still unacceptable.

In the mobile KN client, when the push does not arrive normally (such as when the long connection is disconnected), it depends on the client to call the messagecount.get interface after the long connection is disconnected to get the message unread, and then prompt the user Get the latest news manually. When the memory usage of redis is close to the limit, it is easy to timeout by calling the redis sadd and scard commands. Therefore, if the message is not read correctly, the user will not be able to get the latest news in time.

In general, the memory capacity of redis is not enough to accommodate more and more business messages bizId, causing a large number of redis requests to time out and fail to correctly calculate the unread message. Therefore, the above scheme needs to be optimized.

Second, redis is used for message deduplication judgment, and tair stores the number of unread messages.

According to the above analysis, when redis memory usage reaches the upper limit, it is easy to send timeout, and the reason why redis memory usage will quickly reach the upper limit is mainly because a large number of bizIds are saved in the set structure of inactive users . On the premise that the redis machine cannot be added quickly, the easiest way is to restart redis at night. Restarting redis will bring the following effects:

All the messages bizId stored in the set by all users are emptied, which will result in a misjudgment: that the same bizId message is repeatedly reminded to the user that there is a new message. But this will not cause too much impact on users: because active users will view messages in a timely manner, the active set structure is basically empty; although the redis set structure of inactive users has many messages bizId, but because It is inactive. Even if it is emptied, a new bizId will be stored in it soon, but it is considered to be an inactive user, and it is basically unaware of this situation.

Because the set structure is cleared, the unread messages of all users are also cleared (the unread is calculated by the scard command). According to the previous analysis, when the message push cannot be reached normally, the correct unread reading will prompt the user to actively obtain the latest news, so it is basically unacceptable to clear the user's message unread when restarting redis

Because it is not acceptable to arbitrarily clear user messages and readings, we cannot periodically restart redis to free up memory. But if we separate the message deduplication and the calculation of unread, that is, the set structure of redis is only used to determine whether a message is new, whether it is necessary to increase the unread, and save the unread in other places, such as tair Then, can we restart redis regularly? So we got the following solution:

Continue to use the redis set structure to determine whether a message is new or whether it needs to increase the message unread

No longer use redis scard life
Let the calculated message not read, but use the tair-based counter to calculate the message unread, that is, if it is determined that the new message is a new message through the set structure of redis, then perform incr unReadCountKey 1 on the unread counter stored in tair.



The use of Redis cache Set and some problems encountered by redis-redis cache mechanism
In this way, redis is only used to deduplicate the message bizId, and is no longer used to calculate the unread message. The unread message is stored separately in the tair-based counter. So we boldly restart redis regularly at night. This solution worked successfully for a while, but after a while, the application started to throw a large number of timeout exceptions when requesting redis. After analyzing it, the problem is still in redis memory:

Although the memory can be released by periodically restarting redis, the increase speed of redis memory is unpredictable, and we cannot restart redis every time before the memory usage reaches the limit.

Sometimes although the overall memory usage of redis has not reached the limit, but if a user has too many bizId in the set structure, the scard command will still timeout

So this solution is not an optimal solution, and there is still a better way to reduce the memory usage of redis

Three, redis-based bloomfilter message deduplication scheme

From solution one to solution two, what we have always wanted to solve is how to use the minimum memory to determine whether a message bizId is a new bizId, that is, whether a message bizId already exists. It is easy to associate bloomfilter with the smallest memory to realize the judgment operation. But in this scenario, we can't simply use bloomfilter, let's first calculate how much memory is needed to use bloomfilter "most directly": the memory occupied by bloomfilter is determined by bitSize, and according to the formula:

bitSize = (int) Math.ceil (maxKey * (Math.log (errorRate) / Math.log (0.6185)));

We create a bloomfilter for each message type of each user, with 5 million users, each user subscribes to 10 message types, then the total memory occupied by this bloomfilter for deduplication is:

totalMemory (G) = 5000000 * 10 * Math.ceil (maxKey * (Math.log (errorRate) / Math.log (0.6185)))

The size of this totalMemory depends on maxKey and errorRate. Under the premise of ensuring that errorRate is unchanged, the larger the maxKey of bloomfilter, the larger the memory required by bloomfilter. Then we estimate how much memory is needed to use bloomfilter.

Taking commodity news and small transactions as an example, different sellers have 7 to tens of thousands of messages in 7 days. The smallest is that there are only a few messages in 7 days, and the most are more than 70,000 in 7 days. Even with an evaluation value of 1,000, the memory consumption of these 5000w bloomfilters is also hundreds of G, which obviously does not work.

However, there is another business feature of the unread message of tpn that when a user's unread value of a certain message type has exceeded 99, the specific number is no longer displayed, but is displayed as 99+, and the message of a user If the unread reading exceeds 99, then his sensitivity to the unread reading of the message is actually not high, even if there is a message that is not new, but the unread reading is still +1, and the user cannot detect it.

Therefore, in the above formula, we can set the maxKey of each bloomfilter to 100, then the memory occupied is a very acceptable number: set errorRate = 0.0001, maxKey = 100, then the above The 5000w bloomfilter only requires 11G of memory. Obviously, this is not a completely acceptable memory consumption.

In this way, we come to the following deduplication scheme based on redis bloomfilter:

Use redis' setbit command to implement a remote bloomfilter. For details, see this example: https://github.com/olylakers/RedisBloomFilter/blob/master/src/main/java/org/olylakers/bloomfilter/BloomFilter.java

Every time a new message comes, use redis's bloomfilter to determine whether this is a new message

If yes, +1 the unread counter in tair

Every time the user reads the message, the corresponding bloomfilter is cleared



The use of Redis cache Set and some problems encountered by redis-redis cache solution
In this way, finally we can achieve unread reading calculations with acceptable memory, no longer have to worry about whether redis is not enough memory every day, and applications frequently throw timeout exceptions.

Fourth, the strange connection broken pipe

After scheme 3 goes online, I think these redis should stop. After redis has been running for a while, it is indeed useless timeout exception, but after running for a while, tpn writes commands to redis when it performs a request to redis This exception will be reported:

java.net.SocketException: Broken pipe. We know that if a socket connection has been closed by the remote end, but the client is not aware of it, and still reads and writes data through this connection, then a Broken pipe exception will occur. Because tpn uses jedis and realizes the connection pool of jedis through the common pool, my first reaction is that tpn is not using the connection pool of jedis correctly, and the broken redis connection has not been destroyed, but has been returned to the connection pool, or It's because of a bug in jedis's connection pool that caused the connection to leak, causing ton to write data to a connection that has already been closed. But carefully checked the code of tpn and the code of jedis connection pool, and found that there is no problem, it means that some redis is really closed by the redis server, but the connection pool of jedis is not found.

Because there is no problem in the jedis pool of the client, it can basically be determined that some connections are closed on the redis server. The first suspect is that the redis configuration of tpn is wrong, and the timeout configuration item in redis.conf is incorrectly configured:

The first question is whether there are not many redis configurations for tpn, so I went to check the relevant code of redis. The redis configuration file redis.config contains the timeou configuration item:

# Close the connection after a client is idle for N seconds (0 to disable) timeout 0

Checked all the configuration files on the 6 tpn redis and found that this option was not configured, but tpn deployed two versions of redis, redis-2.6.14 and redis-2.4, the result is in redis-2.4, if there is no configuration This value, redis will use the default value, 5 * 60 (s), and the default value of redis-2.6.14 is 0, that is, disable timeout, and at the same time to check the settings of the jedis common pool, found minEvictableIdleTimeMillis = 1000L * 60L * 60L * 5L (ms), that is, the idle time of a redis connection exceeds 5 hours before it will be recycled by the connection pool. Obviously, it is because the connection idle time settings on the client and server are different, which causes the connection to be closed by one end, but the other end is not aware of it, all of which causes a broken pipe. The solution is to upgrade redid-2.4 to redid-2.6.14.

V. Summary

From solution one to solution three, my biggest feeling is that when solving problems and optimizing solutions, we should not only stick to the technology itself, but to contact the business to think. I have had the idea of redis bloomfilter for a long time, but I have never thought that the number of unread tpn messages only displays 99+ this business logic, but I have always wanted to reduce the length of the message bizId to save memory as much as possible. , The more complicated the result, and then there is no more. . . .

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.