the company's statistical system receives a requirement to count the total number of users who have committed a behavior within the time period. And the length of the time period is variable. The number of business users is huge, and statistics system is real-time statistics, so the data storage, computing efficiency need a better solution. Below is an article on the Internet using Redis bitmap. The important statistics of getspool.com are calculated in real time. Redis's bitmap allows us to perform similar statistics in real time, and is extremely space-saving. In a simulated environment of 128 million users, typical statistics such as "Daily Users" (Dailyunique users) consume less than 50ms and consume 16MB of memory on a single macbookpro. Spool does not yet have 128 million users, but our solution can deal with this scale. We want to share how this is done, perhaps to help other startups.
Bitmap and Redis Bitmaps QuickStart (Crash Course on Bitmap and Redis Bitmaps) Bitmap (i.e. Bitset)
The bitmap is a continuous series of 2 decimal digits (0 or 1), each of which is in the position of offset (offset) and performs and,or,xor and other bit operations on the bitmap.
Bitmap count (Population count)
The bitmap count is the number of bits in the bitmap that have a value of 1. Bitmap counts are highly efficient, for example, a bitmap contains 1 billion bits, 90% bits are set to 1, and a bitmap count on a MacBook Pro takes 21.1ms. SSE4 even has hardware instructions for sizing (integer) bitmap counting.
Redis Bitmaps Redis allows the use of binary data for key (binary keys) and binary data value (binary values). Bitmap is the value of binary data. Redis's Setbit (key, offset, value) operates at the specified offset (offset) for the specified key value at position 1 or 0, and the time Complexity is O (1).
A simple example: daily active users in order to count the number of users logged in today, we have established a bitmap, each identifying a user ID. When a user accesses our web page or performs an action, the location that identifies the user in bitmap is 1. The key value obtained for this bitmap in Redis is obtained by the type and timestamp of the user performing the operation.
In this simple example, the Redis.setbit (Daily_active_users, user_id, 1) is executed once each time the user logs in. The position of the corresponding position in the bitmap is 1, and the time Complexity is O (1). Statistics bitmap results show that there are 9 users logged in today. The key of bitmap is Daily_active_users, and its value is 1011110100100101.
Because daily active users change every day, you need to create a new bitmap every day. We simply added the date to the key behind it to implement this function. For example, to count how many users have heard at least one song in a music app on a given day, you can design this bitmap Redis key as Play:yyyy-mm-dd-hh. When the user listens to a song, we simply put the user's location in bitmap to 1, and the time Complexity is O (1).
[Java]
- Redis.setbit (PLAY:YYYY-MM-DD, user_id, 1)
Redis.setbit (PLAY:YYYY-MM-DD, user_id, 1)
The user who has listened to the song today is the bitmap count of the bitmap that key is PLAY:YYYY-MM-DD. If you want to count by week or month, just get a new bitmap for all bitmap in this week or month, and do a bitmap counting on it.
Using these bitmap to do other complex statistics is also very easy. For example, a premium user who has listened to a song in November:
(Play:2011-11-01∪play:2011-11-02∪ ... ∪PLAY:2011-11-30) ∩premium:2011-11
128 million user performance comparison (performance comparison using million users) the following table shows the time-consuming comparisons of user statistics that were completed on 128 million users for 1 days, a week, and one months.
Period |
Time (MS) |
Daily |
50.2 |
Weekly |
392.0 |
Monthly |
1624.8 |
Optimization (optimizations) in the previous example, we cache daily statistics, weekly statistics, and monthly statistics to Redis to speed up the statistics.
This is a very flexible approach. This extra bonus for caching is the ability to do more statistics, such as weekly active mobile users-the intersection of mobile phone users ' bitmap and weekly active users. Or, if you want to count the number of active users in the past n days, the cached day active user makes this statistic simple-get the daily active user bitmap from the cache for the past n-1 days and today's bitmap, set them to union, and the time consumption is 50ms.
The following Java code is used to count a user action on a specified number of days for an active user.
Jedis redis = new Jedis("localhost");
public int uniqueCount(String action, String... dates) {
BitSet all = new BitSet();
for (String date : dates) {
String key = action + ":" + date;
BitSet users = BitSet.valueOf(redis.get(key.getBytes()));
all.or(users);
}
return all.cardinality();
}
Using Redis setbit and bitmap to count the number of users