Overview of website log collection methods [ZZ from Taobao data warehouse team blog]

Source: Internet
Author: User

The record of click behaviors of website users, which is usually called logs. There are roughly three different methods for collecting the current status of the Internet.

1. Traditional Weblog
That is, when the web server receives an HTTP request from the user, this behavior is recorded and returned to the user's normal webpage content.

Advantages:
1. Simple and Convenient. You can use the log function provided by the web service software;
2. There are ready-made open-source software for log analysis, such as awstat (Perl writing, good versatility, beautiful interface, slow speed), Webalizer (C writing, fast analysis, but the interface is ugly ).

Disadvantages:
1. Regular collection and summarization of logs generated by thousands of servers distributed in various data centers on large websites has become a major problem;
2. Using the cache technology, such as squid, may have logs of different formats, which is also annoying;
3. If a website contains a large number of pages consisting of multiple IFRAME pages, it is impossible to accurately calculate the PV of Website user behavior.

2. Beacon log
Currently, the most popular method on the Internet is to request beacon server when the user's browser accesses the target Webpage through a small piece of code embedded in the webpage. Generally, a configuration server can easily support log records with tens of millions of PVS. Google Analytics is a common statistical tool for small websites. Double click, which was acquired by Google for $3.1 billion, also uses this method to measure the effect of online advertising.

Advantages:
1. Normally, only normal user behaviors can be recorded. PVs generated by crawlers or website scanning cannot be counted directly, but weblog is difficult to distinguish.
2. When a page is requested, only one PV is formed. If an IFRAME page does not exist, the number of PVS is also counted.
3. It is relatively easy to collect and summarize logs.

Disadvantages:
1. data cannot be recorded in Ajax applications. Currently, there seems to be no good solutions.
2. the browser performance is slightly affected and the network bandwidth is consumed. This disadvantage can be ignored, not a big problem.

The advantages and disadvantages are easy to compare. Small and medium-sized websites generally adopt the first method, which is simple and easy to use. Large websites are suitable for the second method to meet the needs of large-scale management. In fact, there is another way for the Web server to actively send asynchronous requests to Beacon server when receiving user requests, so as to avoid external bandwidth consumption and browser performance overhead, however, the IFRAME problem also arises at the same time. However, there seems to be not many applications at present, and the specific advantages and disadvantages are not well evaluated.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.