Author Evan MillerYesGraduate student of Economics at the University of Chicago.
This article is translated by Wu anshou.
If your website is runningA/B testing and regular inspection of the major results of ongoing experiments, you may fall into what the statistician calls "repetitive significance test errors ". Therefore, even if your statistics show that the results are significant in statistics, in fact it is very likely not significant. This article explains the cause.
Background
When one copyThe statistical data of A/B testing shows that "the probability of discrepancy with the original model is 95%" and "the probability of significant statistical difference is 90%". It raises the following question: if there is no potential difference between A and B, what is the probability of us finding A difference in A lot of data by chance? The determination of this problem is "significance level" and "significant statistical results ". For example, 5% or 1% means that the significance level is relatively low. Statistical results are often accompanied by such supplements (for example, 95% or 99%), while reporting "the probability of not matching the original model" or something similar to this.
However, the significance calculation is based on a very strict assumption that you are not even aware of the violation, that is, the sample size is predetermined. If it is not determined in advance, "this experiment will collect near1000 observed object", You said," We will keep running it until we see a significant difference. "The significance level of all reports will become meaningless. This result is totally different from intuition, and allThe A/B test package does not recognize it, but I will try to explain the root cause of the problem in A simple way.
Example
Assume that you areAfter 200 and 500 observed valuesAnalyze the experiment results. Here, four results are generated:
|
Solution1 |
Solution2 |
Solution3 |
Solution4 |
After 200 observations |
Not significant |
Not significant |
Significant |
Significant |
After 500 observations |
Not significant |
Significant |