Spool's developer blog describes how Spool uses bitmaps related operations of Redis to collect statistics on active website users.
Original article: http://blog.getspool.com/2011/11/29/fast-easy-realtime-metrics-using-redis-bitmaps/
Redis supports binary bit-based placement for String-type values. You can use a value to record information of all active users by setting the bits corresponding to a user's id and value. If not, nine bitmap locations are set to 1, indicating that the corresponding users of these nine locations are active users today. The first digit indicates the user whose uid is 15 and the first digit indicates the user whose uid is 0. (If your uid does not start from 1, for example, starting from 100000, you can also use uid minus the initial value to represent the number of digits. For example, 1000000 users correspond to the first place of bitmap)
The specific code is similar to the following:
Redis. setbit (play: yyyy-mm-dd, user_id,
1)
The complexity of such a record is O (1), which is very fast in Redis.
Instead, we use a different key every day to separate the daily active user status records. In addition, statistical data such as active users in N days and active users in N days can be calculated through some or operations.
For example, the first line indicates the active users on Monday, the second line indicates Tuesday, and so on. For example, we can obtain a list of active users in N days through the union operation of active user records in N days.
The following table shows the time taken for one day, one week, and one month.
The following is a specific java code snippet:
Calculate the number of active users per day
Import redis. clients. jedis. Jedis;
Import java. util. BitSet;
...
Jedis redis = new Jedis ("localhost ");
...
Public int uniqueCount (String action, String date ){
String key = action + ":" + date;
BitSet users = BitSet. valueOf (redis. get (key. getBytes ()));
Return users. cardinality ();
}
Calculate the number of active users in a certain number (even if one day is active, it is a Union set)
Import redis. clients. jedis. Jedis;
Import java. util. BitSet;
...
Jedis redis = new Jedis ("localhost ");
...
Public int uniqueCount (String action, String... dates ){
BitSet all = new BitSet ();
For (String date: dates ){
String key = action + ":" + date;
BitSet users = BitSet. valueOf (redis. get (key. getBytes ()));
All. or (users );
}
Return all. cardinality ();
}
There are still many specific usage cases. For example, you can record a bitmap for unique terminal users so that you can count the active status of different end users. Some can achieve the same effect with set. However, using set will increase the memory usage.
========================================================== ========================================
After reading this article, I tested it:
Redis> SETBIT 10086 1
(Integer) 0
Redis> GETBIT 10086
(Integer) 1
For the SETBIT operation with a large offset, the first memory allocation may cause the Redis server to be blocked because Redis needs to generate a long binary series.
Problem:
If there are millions of active users, it is very cost-effective to use Redis BitMap.
If there are few active users, and the user ID is an int with more than 10 characters. That is a waste of memory. It is better to use the set. Then you can find the intersection.
We can calculate the memory: offset = 999 999 999 =, the memory required is about 999 999/999/8 = 119M.
If there are many dimensions in the statistical data and there are thousands of dimension combinations, it is not cost-effective to use this method. We can use bitmap to calculate active retention in another way:
Remaining indicators:
Registration and retention for the next day,
2 days of registration and retention...
N days of registration and retention,
For example, if 1000 of the 300 users registered yesterday have logged on to the website again today, the registration result corresponding to yesterday is 30%;
In general, these indicators depend on the core variable-user access time.
Then we can use bitMap to record the user access time:
If the statistical period starts from January 1, 2013, 2013-01-01 is the first bit... and so on,
The last day of 2013 is the 365th bits.
In this way, we have logged on to the user for all days.
Then we calculate the retention:
Retention calculation:
1) Calculate the corresponding bit for the current day. Now it is July 22, July 01, and the bit is 182.
2) The day is retained:
Check whether BITs (182-1) = 181 BITs exist. If yes, the number of remaining bits + 1
N days:
Check whether BITs (182-n) exist. If yes, the n-day Retention Value + 1
Let's estimate the occupied space. 365bit in a year. 10 million users, occupied space = 10 million * 365bit/8/1024/1024 = 430 M