Bit operation to achieve user retention rate

Source: Internet
Author: User
Before calculating the retention rate, let's take a look at the concept of retention rate. In Baidu encyclopedia, This is what users say: users start to use applications within a certain period of time. After a period of time, the application is still regarded as retained. The proportion of these users to the newly added users at that time is the retention rate, which is calculated at intervals of 1 unit of time (such as day, week, and month. As the name suggests,

Before calculating the retention rate, let's take a look at the concept of retention rate. In Baidu encyclopedia, This is what users say: users start to use applications within a certain period of time. After a period of time, the application is still regarded as retained. The proportion of these users to the newly added users at that time is the retention rate, which is calculated at intervals of 1 unit of time (such as day, week, and month. As the name suggests,

Before calculating the retention rate, let's take a look at the concept of retention rate, which Baidu encyclopedia says:

After a period of time, the user starts to use the application. After a period of time, the user continues to use the application as a retention. The proportion of these users to the newly added users is the retention rate, statistics are collected every one unit of time (such as day, week, and month. As the name suggests, retention refers to "How many users have stayed ". The retention and retention rates reflect the application quality and the ability to retain users.

To put it simply, 100 new users are added on the first day. On the second day, 50 users are logged on, and on the third day, 30 users are logged on... And so on
The retention rate for the next day is 50%, and the retention rate for the third day is 30%.

In the statistical system, you often need to calculate the retention rate of users. Here we will sort out several implementation methods for the statistics on retention rate.

1. implemented through the Last Logon Time

There is a unique table to record new users. This table contains at least three fields: uid, reg_time, and last_visited_time. The last access time (last_visited_time, then, in the early morning of the 1949th, the value of reg_time is 3.8 and the value of last_visited_time is 3.6. For details, refer to SQL:

SELECT COUNT(*) FROM TBL_NAME WHERE DATE(reg_time) = '2014-03-06' AND DATE(last_visited_time) = '2014-03-07'

The implementation is simple, but the problem is also obvious. If these users have access at and the access time is updated step by step, the retention rate is not recorded, the deviation from the entire result is not too large. Ignore it first. A more obvious problem is that statistics cannot be repeated. If a script error occurs or you need to re-calculate the statistics, the statistics cannot be implemented. Of course, there are also advantages, that is, convenient statistics and convenient addition of N-day retention.

2. implement it by creating independent fields

Independent fields can be designed like this, uid, reg_time, day_2, day_3, day_4... wait, when the user has access the next day, the field "day_2" is updated to 1, and the field "day_3" is updated to 1 on the third day. The default value of this series is 0. If the same statistics are retained the next day, the SQL statement is like this:

SELECT COUNT(*) FROM TBL_NAME WHERE DATE(reg_time) = '2014-03-06' AND day_2 = 1

This method can be repeated for statistics, but it is not convenient to expand. If you do not consider the 15-day process, you need to modify the table structure and add day_15.

3. bitwise operations

The values recorded in the data table above are a lot of 0 and 1. You can use these binary values 0 and 1 to indicate whether there have been accesses on the day, and 1 to indicate that there have been accesses, 0 indicates no access. The design table contains the following fields: uid, reg_time, and retension. If retention record is retained
Access 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 corresponding to decimal 1, retention record is 1
The next day, Access 0 0 0 0 0 0 0 0 0 0 0 0 1 1 the next day, after access, the retention is updated to 3.
Access 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 1 1 no access on the third day, access on the fourth day after the ention is updated to 11
And so on. The next step is to calculate the retention of the day. The next day is used as an example. Perform the bitwise AND operation on the data of the next day and the value of the 2nd-bit value and the other-bit value of 0.

    0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1     &    0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0    =    0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0

Bitwise AND Yes are both set to 1. If an integer is used to represent the value of 3 & 2, the next day is retained. If the result is 2, access is performed on the next day, if the result of not 2 is 0, no access is allowed. Therefore, the SQL statement for the nth day should be (N indicates that the nth day is retained, for example, 3rd bits are the 2nd power of 2 in 3rd days ):

SELECT COUNT(*) FROM TBL_NAME WHERE DATE(reg_time) = 'XXXX-XX-XX' AND retention & 2^(N-1)

Of course, the actual number of days here indicates the number of days for retention can be set by yourself. If the number of 10th bits indicates that the retention period is 30 days, the retention and 2 ^ 9 are calculated by the bitwise and so on.
The read and write problems are solved here. The initial registration time is 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1, if an access occurs on the next day, perform a bitwise OR operation between the value of the previous day and the value of the second digit as 1 and the other digits as 0, and set any one of them as 1.

    0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1    |    0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0    =    0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1

No access on the third day, and access on the fourth day is

    0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1    |    0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0    =    0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 1

In SQL, N indicates the nth day of access)

UPDATE TBL_NAME SET retention = retention | 2^(N-1) WHERE uid = 'XX'

In addition, this update operation can be repeated on the same day, because one by bit or only needs to be 1, and the first update 1 on the seventh day | 2 = 3, the second update is 3 | 2 = 3. The visible values are the same.
After hearing this solution, I also suspected the efficiency problem. The speed of statistics in million data is similar to the index time in reg_time, so the problem is not serious. One integer is 4 bytes and 32 bits, it can indicate 32 different records, and 8 bytes of long integer can be used if the integer is not enough. In general, this method is scalable and can be re-analyzed, so it is feasible.

Bitwise operations have only been seen in permissions before. Here is a good way to use bitwise operations. I look forward to more thoughts. The following are the basic operations of bitwise operations:

Original article address: Bit operations enable user retention. Thanks to the original author for sharing.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.