Website Analysis--is our data accurate?

Source: Internet
Author: User

Intermediary transaction http://www.aliyun.com/zixun/aggregation/6858.html ">seo diagnose Taobao guest cloud host technology Hall

"Introduction" In the long course of doing web analytics, the most asked question is "is your data accurate?" Web site Analysis of the data is accurate or not, how to view the site analysis data may exist deviations, this article will give an answer.

Body】

Accurate and accurate in Chinese is a synonym, we can be mixed in the spoken language, English is also so, accurate and precise people also use, blurted out. However, since there are two words in existence, and do not perish in the long river of the evolution of words, they are still subtly different. In fact, accuracy and precision are definitely not the same concept, they are strictly differentiated in engineering, statistics and many other sciences, as well as the new disciplines of web analytics.

Let's look at exactly what the difference is between accuracy and accuracy, and then see if the Web Analytics tool can be accurate or accurate, or both.

What is accurate and what is accurate

Wikipedia has excellent explanations for accuracy and accuracy, and is a classic entry. Here I salute it in Chinese: accuracy refers to the phenomenon or measured values relative to the fact that the discrete degree of small, that is, we speak of "close to facts, in accordance with the facts," and so on, precisely refers to the condition, the phenomenon or measured values can be reproduced in a low degree of dispersion, that is, we say "times so, Hui-like" And so on. The following two charts are particularly classic, and are quoted from Wikipedia:

  

Figure 1: This refers to relatively high accuracy, but relatively low accuracy

  

Figure 2: This refers to relatively high accuracy, but relatively low accuracy

The red center of the two figure above represents the fact. As you can see in Fig. 1, the measured values revolve around the center of the circle, although the distribution is discrete, but it can be seen that their average distribution position is definitely in the circle (or that the average value of multiple measurements is true), so it can be called accurate, but because the result is discrete and cannot be In Fig. 2, the measured value deviates significantly from the center of the circle (the average of the measured value is also not possible at the center of the circle), so it cannot be called accurate, but it can be called precision, because the value of the measurement is a low degree of reproducibility. This is an excellent explanation of accuracy and accuracy.

If we build a matrix of accuracy and precision as two different dimensions, we can get the following figure:

  

Figure 3: Accurate and accurate matrices (picture source: www.wellesley.edu)

The upper left quadrant is our favorite, both accurate and accurate--that's the case for physics and most science and technology; The bottom right is the worst, not only imprecise, but inaccurate--the most common in life, our social life is very discrete and very chaotic.

So, naturally, you ask, what quadrant does Web analytics belong to? It must be the quadrant in the upper left corner, right?

is the website analysis accurate?

First of all, there is no fixed answer to this question, because the accuracy of web analytics depends largely on your expectations and the monitoring methods and tools used. However, in terms of our most commonly used Web analytics, site analysis is definitely not part of the two quadrants on the left side of Figure 3 (that is, not both accurate and accurate quadrant, nor accurate but imprecise quadrant), or, to put it simply, the data on the Web site is not accurate.

This may disappoint you, but I believe it is not unexpected. You must have found that if we use different web analytics tools to measure the same site, there is a puzzling difference between the results of the tools (we are exploring why the data in the two monitoring tools report is different), and we have no way of knowing which tool is more accurate in restoring the actual data.

So, if GA shows that your site has 36,954 unique Visitor in one months, your site's real visitors (a live netizen!) Definitely not 36,954!

In fact, we can hardly find any metric that is accurate to be counted, even the most basic and simple metric--page view!

Therefore, if your boss wants 100% without error to know how many people have visited the site, the desire itself is meaningless.

Why Web analytics data can't be accurate

You may be surprised, because our physics is actually not 100% accurate, because we have all heard of the "uncertainty principle." Similarly, the website analysis because of one of the most basic facts can not be accurate, namely: the Web site analysis of the monitoring media is the browser and server, rather than the real person, it is doomed we can not seek accurate results.

Specifically, the two monitoring methods that we usually use--server log and page tag cannot accurately count some of the most basic metrics of web analytics.

Error in Server log (Bias):

Unique Visitor Error:

If you use the server log method to monitor data, it is clear that getting the real number of visitors is an impossible task. itself server log estimates for visitors can only be based on the error of a large IP, and the network crawler/robot access to further increase this error.

Page View error:

Originally server log can be very accurate to record page view, but unfortunately the advent of the cache to make this history. The cache is most likely to mask server-side responses so that server log may not leave any information about a page view record.

Error of Time record:

In the absence of cache interference, the server can accurately detect the start of the access time, but the time of the end of the visit is not understood. Because the end of the access is often closed with the browser close together. Closing the browser itself does not fire a new server log record.

Flash site Error:

If the main component of a Web site is a flash file containing multiple pages, or a combination of multiple such flash files, then server log will not record the operation within Flash, and the monitoring will almost fail.

Page Tag Error:

  

Page tag is invalid:

Page tag expiration occurs. First, some browsers (such as some browsers on a mobile phone) do not support JavaScript or are set to JavaScript disabled. Second, Page tag may fail to run because of its previous JavaScript error. Again, we've seen cases where page tag and other JavaScript conflicts on the page have occurred because of conflicting variable names. Finally, by the speed of the network, the page tag is not completely downloaded, the browser may be artificially closed or directly linked to a new page.

Obviously, if page tag fails, the Web analytics tool loses some or all of the data.

Location of Page tag:

The location of page tag in the pages affects the count of Web site analysis tools. If page tag is at the top of the page, it will be executed more quickly, and the smaller the number of other factors (such as the other JavaScript expiration or the network problem before page tag), the less the count will increase. Stone Temple Consulting statistics show that in the case of the code, the visitor count is 4.3% more than that on the page.

Unique Visitor Error:

A computer may be used by more than one person; a computer may have multiple browsers (causing access to the same Web site with multiple cookies); People will delete cookies (the 2007 comscore statistics show that 30% of U.S. users in one months will delete browser cookies); Cookies are disabled (although the WA tool generally uses the first party cookie, there are still about 10% first-party cookies that are blocked by the user).

Page view error: Mainly caused by page tag failure.

Error in time logging: Like server log, page tag can accurately record the time the access started, but the end time is not understood because the end of the access generally does not trigger the execution of page tag.

Because basic metrics such as page View, visitors, and access times are virtually impossible to record accurately, other, more advanced metrics, such as our common composite metric (Bounce rate,avg. Time on Site), are less accurate. However, knowing the causes of these errors helps us to further revise the error. Some monitoring tools, such as DoubleClick, an advertising monitoring tool, have a self-correcting function that utilizes this principle.

Errors in other monitoring methods:

Other ways of getting data from the Web analytics-such as collecting data from client software (Alexa,iresearch, etc.), and sniffer (packet sniffing)-are much more error-bound by their own monitoring methods. For example, there is obviously a deviation of the sample size from the client to collect the data, and sniffer is essentially a replica of the server log, but it increases the packet loss and limited data logging. They cannot be more accurate than the two previous methods.

are Web analytics tools accurate?

Now that you know the Web Analytics tool does not accurately count. So is the Web analytics tool accurate?

I would say that precision is the essential feature of web analytics tools, Web analytics tools are not accurate, but must be accurate. If a Web analytics tool is not accurate, it will be the same as garbage.

The reason web analytics tools must be accurate is simple because we need the data to be highly consistent. As shown in Figure 4 (below), if the accuracy of the site analysis tool exists-20% to +20% of the error, then assume that the November 4 site accurate traffic is 50 UV, the Web site Analysis tool may report a number between 40 and 60. Similarly, we assume that the site's accurate traffic for the next day (November 5) is 51 UV, and the value reported by the Web Analytics tool may be any number between 41 and 61. So, because of the imprecision, the November 4 data is likely to eventually render 40, and the November 5 data is likely to eventually be rendered at 61, so that the Web analytics tool will erroneously report a satisfactory growth-but in fact the growth does not exist. Conversely, if the November 4 data were reported to be 60 and the next day was reported to be 41, it would be worse than the actual situation.

  

Figure 4: If the Web analytics tool is inaccurate it can have serious consequences

Therefore the website analysis tool must be accurate, if it and the fact have-20% error, then regardless of which day at which moment, it must be smaller than the accurate value 20%. Otherwise we will get the wrong conclusion. Of course, 100% of the precision is not there, in general, allow the XXX about the system deviation, this one to actually have a maximum of 10% degree of separation, in fact, is already a very wide standard.

Web analytics tools do not do 100% accurate reason is also affected by the same factors as the previous section, as well as some site visitors in the environment caused by changes in unknown anomalies, such as network bandwidth changes or data transmission during the abnormal loss.

So how accurate are the Web analytics tools we actually use? If 5 stars are the most accurate, then:

Google Analytics, Precision 3 and a half star, you can pass. But our friend Ben (Zenghaiban) and I have found that Google Analytics is not exactly accurate, which may be related to data reprocessing. Ben's case is +/-20%, but extremely rare. Did any of the other friends find out?

Omniture, accuracy 4 stars, better. Of course, I have no way to verify, just because their home data relatively little to cause me trouble, also relatively few unexplained time. But the problem with Omniture is that there are too many definitions, and that the definition of the same metric is not exactly the same in different situations-it's a huge data system.

webtrends,4 a half star. The reason for the higher score is that WebTrends is implemented on the server side of the Web site, or is owned by the site owner, so the situation is relatively less disturbed by the external environment. This is an innate advantage, no doubt.

How to deal with inaccurate but accurate features of web analytics tools?

The inaccurate but precise nature of the web analytics tool does not prevent us from acquiring a true insight. We need to follow the basic principles of three web analytics (I'm the one who presses the bottom of the box):

Principle I: Trends.

Look at trends instead of looking at isolated data is the most important principle of web analytics. You can't be ecstatic about 500 traffic today, but if the average traffic last month is 300, and the average traffic this month is 500, then I congratulate you and you deserve to be happy. We have discussed this in previous articles.

Because Web analytics tools are accurate, they can accurately reflect trends, although they do not accurately reflect the data. This is why all our website analysts think the trend is the most important methodology.

Principle two: subdivision.

Because the accuracy of the web analytics tool, if the overall value than the actual accuracy of 20%, then the composition of the whole of the parts will also compare their exact value is smaller 20%. Therefore, the subdivision needed to compare can still meet the needs of analysis.

Principle three: transformation.

Similar to the subdivision, accuracy ensures that the conversion is magnified or reduced from year to year, so the ratio of conversion itself is accurate.

So the final conclusion we're going to make is that the data needed to really help us with our web analytics is accurate. Therefore, when we understand and learn to use the three principles of web analytics, we will shift the quadrant of the Web Analytics tool to the quadrant in the upper-left corner-that is, accurate and accurate. Really, the ultimate accuracy of web analytics tools is whether you use them well, this is the idealistic conclusion, but really the truth.

Good luck to all of you.

[Copyright belongs to author Sidney Song (sing) all, welcome to reprint, but please inform the author in advance and indicate the origin]

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.