Statistical significance test of data in Web Analytics

Source: Internet
Author: User
Keywords Analysis nature and learning

Intermediary transaction http://www.aliyun.com/zixun/aggregation/6858.html ">seo diagnose Taobao guest cloud host technology Hall

In the site analysis, often do site optimization test, will compare the conversion rate of different schemes, such as jump rate, order purchase rate, button clicks, etc., will also record the visitors or customers detailed data performance. But many times the difference is very small, whether it is to maintain the status quo or the full adoption of the content of the new plan, it is difficult to have a conclusion, so the difference in the statistical significance of whether significant becomes important.

This article mainly explains two methods of examining data: Using the data significant calculator which has already written function in Excel, and using SPSS to make a significant test for detailed customer data.

excel-Data Significant Calculator

Assume the following data:

Social media source access number order order purchase rate

youtube250008903.56%

facebook48002405%

Then we can use the Avinash Kaushik introduced excel-data significant calculator to check, please see http://www.kaushik.net/avinash/excellent-analytics-tip1-statistical-significance/

Excel files can be downloaded from here: http://vdisk.weibo.com/s/cz9E6

When the data is entered and the calculation is known (number of Test participants is the denominator, number of conversions is a molecule), the difference is significant because the box shows the "Yes"

  

The principle of the above method is that the difference between the two sets of data exceeds the confidence interval of the data, then the result of the significant difference of the data will occur.

The above method is applicable to the comparison between the simple two ratios, then to talk about the content of the advanced point, the hypothesis test in SPSS to compare the average of two samples.

Two independent sample T test

The methods of comparing mean values in SPSS include:

Sample variable relation paradigm for hypothesis testing

The monthly (salary, experience, place of work) relationship between the site analyst and the average (single two factor) Single sample variable

Single sample t test single sample variable self iphone fuselage length

Two independent samples T test the effect of different promotion schemes between double sample variables, male and female height

Paired sample T tests the purchase behavior of visitors in different months of the same population with two sample variables

In the introduction of the two independent sample T test, first of all, the comparison of the mean, from the inside to the next.

1. Test of Mean value

The steps of a hypothetical test are generally divided into the following:

1 to determine the original hypothesis and alternative hypothesis (the original hypothesis means to make some assumptions about the proportions, mean or distribution of the population)

2 Select test statistics

3) Calculate the probability of the occurrence of the measured observations, p value

4 Given the significance of the level alpha, if p<α, that is, small probability events occur, that is, the probability of the original assumption is very small, then overturn the original hypothesis, if the p>α, then the original assumption was established.

The following two scenarios are assumed:

1 Factory quality manager said: The product defect rate is only 1/1000, then you start spot checks, smoked 5 pieces, there are 2 problems, then the problem is big single.

Because the maximum number of defects in 1000 is 1, now there are 2 things, that is, the probability of a very small thing happen.

The conclusion of the maximum Defect number original hypothesis test

11/10002/5 negation

2 The quality of the factory manager said: The product defect rate is only 1/100, then you start spot checks, smoked 5, there are 2 problems, then the problem is very big single.

The maximum number of defects in 1000 is 10, now there are 2, then there are 995 to check, then there are two possible:

* Product defect rate is far higher than 1%, quality managers cheat people;

* It happens to be a defective product and the next 995 pieces are rarely defective.

Probability calculation:

  

Original hypothesis: That is to say that the product defect rate is 1/100, the front smoke 5 pieces, there are 2 defective probability is 0.088%;

The conclusion of the maximum Defect number original hypothesis test

101/1002/5 undecided.

5 pieces of 2, follow-up inspection of the product defect rate is less than 1/100 of the probability of 0.088%, 5 pieces of 2, follow-up inspection of product defect rate greater than 1/100 of the probability is 99.912%, that is the original assumption of probability <α, if Alpha is 5%, then 0.088%<5%, That is, the small probability event occurred in the test, the unlikely thing happened, then overturned the original hypothesis.

Note: The data case from the becomes teacher's SPSS data

2. Two independent sample T test

The T-test of two independent samples refers to the overall independence of the two samples, with the aim of analyzing whether there are statistically significant differences in the mean values of two independent samples. The following case background: Different optimization scenarios affect the value of the visitor order.

First, the prerequisite:

To perform a two-independent sample T-Test, the following conditions are met:

1 Total Independence

2) The general obeys the normal distribution

3 The variance of the sample is the same

The original data is as follows:

  

1. Data settings

1 Select analysis-Descriptive statistics-Explore

  

2 The guest sales (sales) into the list of dependent variables, different scenarios (test type) fill in the list of factors:

  

3 Click to draw, check the histogram and the normal diagram with the check

  

2. Data reporting

As shown in the following figure, two programmes each have 200 samples:

  

As the following figure finds out:

1 0 sales mean (1697) is greater than 1 scheme sales mean (1570)

2 The standard difference between the two is not big, 657/610 standard deviation is approximately equal to 1.

  

The following figure is the histogram of the 0 scheme (verify that it is normal)

  

The following figure is the histogram of the 1 scheme (verify that it is normal)

  

The following figure finds that the P value (SIG) of the 0 and 1 schemes is greater than 0.05, and therefore all have normality.

  

Ii. formal analysis of the T-test of two independent samples

Click Analysis-Compare mean-independent sample T test:

  

Put sales in the test variable, test into the grouping variable, and click on "Definition group", respectively, with 0 and a fill:

  

Two independent sample tests assume that the results of T test with variance equality and variance are not equal.

The P value of the Levene test of the variance equation is 0.94, greater than 0.1, which indicates that the variance of the two independent samples is homogeneous, so we choose the case that the variance of the hypothesis is equal.

In the case of Fangchasiang, the SIG (p value) is 0.047, less than the significant level of 0.05, so the consumption amount of the 1 scheme is significantly different from that of the 0 scheme, and 0 scheme is statistically significant compared to 1.

  

Note: The above SPSS database data is imported into Excel data generated, the above Excel data is virtual, can be randomly generated by the rand () function.

The above is the statistical significance of data validation, with the advent of large data era, simple Web site front-end data analysis is relatively simple, customer data and order data analysis needs will be more and more, I hope that the mastery of tools to help us straighten out all this.

Original: Shenzhen website analysis http://www.szwebanalytics.com/data-analysis-excel-spss.html

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.