Website Data analysis: Parameter estimation and confidence interval

Source: Internet
Author: User
Keywords US estimated

Intermediary transaction http://www.aliyun.com/zixun/aggregation/6858.html ">seo diagnose Taobao guest cloud host technology Hall

  

We always want to look at some sample data to explore the overall performance characteristics of the data, in the site data analysis is also the case, we try to infer from the recent days of data performance to speculate on the overall situation of the current site, there is no good or bad signal, but the current days of data can not fully represent the overall, So you can only use "estimate" here. At the same time, the site's data always fluctuate, the most recent time period of data as a sample sample is very likely that the data is at a lower or higher level, so the estimates we obtained from the sample can not be unbiased, we also need to evaluate the estimated range of possible changes.

Parameter estimation (Parameter estimation) refers to the method of estimating the total parameters with the statistic of the sample, including point estimation and interval estimation.

Point estimation

Point estimation (estimation) is a statistical inference method, which is used to estimate the eigenvalues of an unknown parameter by sampling statistical indices.

Generally, the estimation of the general parameters will include two kinds: one is to estimate the total mean value with the sample mean. Corresponding to the site data in the numerical indicators, such as the daily UV, we can use the daily average of nearly a week of UV to estimate the current site of the total number of visitors per day the general situation; the other is to use the sample probability to estimate the total probability, Corresponds to the ratio index in the website data, for example, the target conversion rate of the website, we can use the conversion rate of nearly 3 days to estimate the level of the target transformation of the site, and we will calculate the standard deviation of the sample to show the size of the sample mean or probability, so as to estimate the fluctuation of the overall data.

The point estimate also includes parameters for calculating the probability density function of the sample set distribution by using the least squares method to fit the curve parameters of the linear regression and the maximum likelihood estimation.

Interval estimation

Interval estimation (Interval estimation) is to estimate the possible range of unknown parameters of a population according to the requirements of accuracy and precision. Interval estimation is generally based on a given confidence level to calculate the total mean or the total probability of the confidence interval (confidence Interval), generally based on the number of samples and standard deviation calculation of the overall standard error, According to the point estimation, the total mean or total probability is estimated by the sample mean or sample probability, and then the upper and lower critical point of the value is obtained.

We can take the sample standard deviation as s, and if we sampled the n samples, the overall standard deviation can be estimated by the sample standard deviation:

  

From this formula we can see the role of the large number theorem, when the number of samples n larger, the overall index difference σ is smaller, the sample estimates closer to the overall real value. Excel charts also provide the ability to add error bars:

  

With the general standard deviation σ, we can use the method of interval estimation to calculate the confidence interval of the overall parameter under a certain confidence level, and the confidence interval (confidence Interval) gives the true value of a general parameter which falls in the range of values under certain probability, The probability that the overall parameter falls within this range is the confidence level (confidence levels).

Based on the z statistic formula:

  

If at the confidence level of the 1-α, the confidence interval of the total mean μ is:

  

Both the sample mean and the standard deviation can be calculated according to the sampling result, so we can calculate the confidence interval of the whole mean value only if we check the Z Value table (z-score) to get the corresponding Z value under the given confidence level. For a confidence level or a choice of confidence, in statistics, it is generally believed that 95% of the results of the confidence level is statistically significant, but in fact, in the Internet domain data analysis does not require such a high degree of confidence, we will sometimes choose 80% or 90% of the confidence, the corresponding Z value is shown in the following table:

The confidence level 1-α corresponds to the z-value ZΑ/2 95% 1.96 90% 1.65 80% 1.28

For the estimation of the total probability, we use the sample probability p to estimate the total probability, and the standard deviation of the total probability is sqrt (p (1-p)/n), and the confidence interval can also be computed.

In fact, most of the content of this article can be found in statistics books or online wikis, of course, the blog is not to do in order to do popular science, here each "Data analysis method" category is the article with the corresponding Web site data analysis of the application of the article, this article is no exception, if you are interested in related content, Please pay attention to the post, or subscribe to my blog.

This article uses»in agreement, reprint please specify the Source: Website data analysis» "parameter estimation and confidence interval"

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.