In the analysis of website data, time is one of the most common and indispensable dimensions, most of which are used to limit the range and granularity of index statistics, while time factors can also affect some statistical rules and details of indicators, and in some data analysis it is easy to overlook the effects of time factors, These effects may mislead the final conclusion.
Found that the problem is in a data extraction requirements, the site will release a lot of new content every day, these new content needs to be recommended, otherwise it will be buried, so many sites will have "the latest recommendations" and other modules, and this data demand is to analyze what should be recommended new content? The quality of the new website content is uneven, And there is less data accumulation, and recommended modules need to place the potential of new content, so that the potential to fully explore the growth of the popular content, so the data analysis to do is to find the potential for new content. If it's TOP10 's recommended list, the easiest way to do this is to sort the top 10 by the amount of traffic or conversion of the new content, but there's a lot to note about where the conversion rate needs to be noted, which is the secret to the key indicators. What do we need to pay attention to if you choose Summary data for the last week? As you may have thought, the reason for the new content here is that the new content has a release (Publish time), just like a person's birth date, and that from the time of release to the current time interval is the duration of the content, It can also be thought of as the life period of the content (Lifetime), just like a person's age. The longer the content lasts, the more data accumulates, the greater the chance of corresponding high access, if we compare the total amount of content released at different times of the week in the week, those will fall into the trap of misplaced comparisons, or "mismatch".
An image metaphor is the newly enlisted recruits and the battle-hardened veteran's duel, although the recruit is not completely not the opportunity to win, perhaps the recruit is born brave, or have a calf momentum, can beat the veteran, but in most cases this is less likely, This is an unfair duel, and in data analysis we need to try to avoid such unfair duels (comparisons).
Content and Commodity analysis
In fact, the occurrence of such errors in daily life may be very common, when I posted a few days after the new blog post Google Analytics to see the data, found that the new article page in the relative position, not because nobody really look, but GA on the default display of nearly one months of summary data, The report is sorted according to pageviews the new content cannot quickly rush to the top few in a short time. For websites with new content or low frequency of new products, operators may be more aware of what is new, so it is not easy to fall into the trap when analyzed through some artificial identification, but for hundreds of new content sites each week, such mistakes are likely to bury some of the best quality products.
We need to find a way to circumvent the impact of this time factor on the results of the analysis, usually when we choose to compare objects, we need to control all the comparison objects have the same duration, such as we compare the popularity of the new content, the unified selection of data for the last week, for the earlier release of content to discard the previous data, The content that has just been released in the past week has been discarded and not involved in this comparison until a full week of data is added. This will ensure that comparisons are on the same baseline, but no doubt postponed the assessment of the conclusion, for some on the performance of eye-catching content can not be found in time, so here the use of statistical unit time indicator of the method, that is, according to the content of the published time statistics to get the duration of each content (generally accurate to the day), Then, by dividing the total amount of content access by this duration, you get the amount of content accessed per unit of time, and then compare:
The above table takes 5 new published content for nearly 10 days of traffic data, adding the number of days since the content was published, we calculated the average daily amount of traffic by dividing the total number of visits by the duration, and then ranked them in descending order of total traffic and average daily traffic, respectively. If we were to sort 1, we would probably ignore the strong performance of the D content, and the sequencing after the time factor would allow us to more accurately grasp the potential of new content.
The above method also applies to the E-commerce website Commodity Analysis, many electronic commerce website wants in the new commodity to select has the sufficient potential commodity to carry on the key marketing, uses in the so-called "The explosion money", thus further promotes the order quantity growth to increase the sales and the profit. The choice of potential new products on the one hand need to be sensitive enough to smell and vision, on the other hand, the use of data analysis, and this time have to consider the above mentioned the impact of the time factor, remember that one months to sell 20 pieces of merchandise is not necessarily worse than the sales of 50, the key is these goods when you are on the shelves, The use of effective methods to assess the real potential to find a valuable growth point of goods.
Know that no website content or product is enduring, have their own life cycle, so sensible web site operations are always looking for new growing points, if the data analysis of the lack of time factor, those potential products and content will likely be "honed" product content for a long time to suppress, Causes the website metabolism to be too slow, then falls behind other websites.
User Analysis
In the user analysis, the same need to pay attention to time factors, such as user RfM analysis, customer loyalty value scoring, user life cycle value, these based on the user in a period of time continuous behavioral analysis models are easy to fall into the trap of time. We can't ask for a new user who has only been registered for a week in the last one months, because you only gave him 7 days, and he is with a sufficient 30 days of users; Likewise, you should not compare a new user with only one months of consumption and consumption in three or six months, because they are not on the same starting line. But new users have the potential, it is possible that they will grow up to be loyal users of higher value, so we need to eliminate the impact of this factor in marketing for users, as well as the indicator of how the unit time is calculated by dividing the user's use of the duration of the site (starting with the user's first visit or registration time). Use the RFM model to look at the differences in user assessment before and after considering the time factor:
If the RFM model chooses nearly 100 days of data to analyze the user, as shown in the table above, this also adds the "duration" statistic, where the user registers to the current number of days, and if the user's registration time is 100 days old, then the user's duration in the statistical cycle is 100 days (maximum duration). The recent purchase interval (R) in the three metrics of RFM is not affected by the duration of the user, therefore, when considering the time factor, the purchase frequency (F) and the consumption amount (M) are affected by the duration, and the value of the unit time (here is the day) is calculated by dividing the duration. That is, each user in the table before and after "whether to consider the time Factor" index transformation. From the comparison before and after, User 1 because is the continuous use of the old users, do not consider the time factor before the purchase frequency and the amount of consumption has a clear advantage, but after the data transformation, the user 2 performance of the stickiness and value is higher, that is, users 2 although the use of the Web site is not long, but in the unit time to buy more than the user 1, We further looked at the effect of the time factor before and after considering the radar chart:
After standardizing the data in the diagram, Blue Line on behalf of the user 1, the Red line on behalf of the user 2, dotted lines that do not consider the time factor, the implementation of the representative considered the time factor, you can see that the user 2 in the consideration of time factors after the value is significantly magnified, from the figure can get users 2 of the expected value If we do not consider the impact of time factors, the results of the analysis will have a significant deviation, which may mislead the user's correct assessment.
In fact, the time factor mentioned here is a follow the principle of comparison, the comparison between the objects must be comparable, otherwise the results of comparison without any significance.