Website log file analysis and analysis method essence

Last Update:2014-12-19 Source: Internet

Author: User

Keywords Website analysis website data website log

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Intermediary transaction http://www.aliyun.com/zixun/aggregation/6858.html ">seo diagnose Taobao guest cloud host technology Hall

By analyzing the log logs of Web sites, we can see the behavior data of users and search engine spiders visiting the website, which can let us analyze the user and spider's preference to the website and the health of the website. In the Web log analysis, we mainly need to analyze the spider behavior.

In the spider crawls and collects the process, the search engine will assign the corresponding resources quantity to the specific weight website. A search engine-friendly website should make full use of these resources, so that spiders can quickly, accurately and comprehensively crawl valuable, user-like content, without wasting resources on the useless, access to the content of the exception.

However, because the amount of data in the site log is too large, we generally need to use the Web site Log Analysis tool to view. Common log analysis tools are: Light-years log analysis tool, Web log exploer.

In the analysis of the log, we need to analyze the contents of the daily log files: The number of visits, stay time, crawl amount, Catalog crawl statistics, page crawl statistics, Spider Access IP, HTTP status code, spider active Time, spider crawl path, etc. For many days log files we need to analyze the content are: the number of spider visits trend, stay time trend, the overall crawl trend, the catalog crawl trend, crawl time period, spider active cycle.

Let's take a look at the site log how to analyze?

Website log data analysis and interpretation:

1, the number of visits, stay time, crawl amount

From these three items we can tell: the average number of crawls per page, single page crawl stay time and average time per stay.

Average number of pages per crawl = Total Crawl/number of visits

Single page Crawl stop = every stay/crawl

Average Time per stay = Total stay time/number of visits

From these data we can see the spider's active degree, affinity degree, crawl depth and so on, the total number of visits, stay time, the higher the crawl, the average crawl page, average stay time, indicating that the site more popular search engines like. and a single page crawl stay time to show that the site page access speed, the longer the time, indicating that the slower the Web site access, the search engine crawl included more unfavorable, we should try to increase the speed of the Web page loading, reduce the single stay time, so that more crawler resources to crawl included.

In addition, according to these data we can also statistics for a period of time, the overall trend of the site performance, such as: The number of spider visits trend, stay time trend, crawl trend.

2, Catalog crawl statistics

Through the log analysis we can see which directories of the Web site by spiders like, crawl directory depth, important page catalog crawl status, Invalid page catalog crawl status. We can find more problems by comparing the pages crawled and collected in the catalogue. For important catalogs, we need to add weights and crawl through internal and external adjustments; For invalid pages, screen in robots.txt.

In addition, through the daily log statistics, we can see the station inside and outside the behavior to the directory effect, optimization is reasonable, whether to achieve the desired effect. For the same directory, for a long period of time, we can see the page performance of the directory, based on the behavior of the reasons for the conjecture.

3, page crawl

In the site log analysis, we can see the specific Spider crawled page. In these pages, we can analyze what spiders crawl the pages that need to be banned from crawling, crawl what is not included in the value of the page, crawl what duplicate page URL, etc., in order to make full use of spider resources we need these addresses in the robots.txt to prohibit crawling.

In addition, we can also analyze the reasons not included in the page, for the new article, is because it has not been crawled and not included or crawled but did not release. For some pages that are not very meaningful to read, maybe we need it as a crawl channel, for these pages, whether we should do noindex tags and so on. But on the other hand, the spider will be retarded to rely on these meaningless channel pages crawl page, spiders do not understand sitemap? "In this, the stupid bird has doubts, asks to share the experience"

4, spider access to IP

It has been proposed through the IP section of the spider to judge the site down the right situation, stupid birds feel that this is not very important, because this after the knowledge is too strong. And the right to drop more should be judged from the first three data, with a single IP section to judge the significance of the small. IP analysis of more uses should be to determine whether there is a collection of spiders, false spiders, malicious click Spiders.

5. Access Status Code

Spiders often appear in the status code such as 301, 404, and so on, these status codes should be processed in time to avoid bad impact on the site.

6, crawl time period

By analyzing the number of spider hours crawled in a single day, we can find out how specific spiders are active at a particular time in this site. By comparing weekly data, we can see the active cycle of a particular spider during the week. Understand this, for the site content update time has a certain guiding significance, and before the so-called "a senior" is not scientific statement.

7, Spider crawl Path

In the site log we can track the access path to a specific IP, if we follow a specific spider access path can be found for the site structure of spiders crawl path preference. From this, we can guide the spider's crawl path properly, let the spider more crawl take important, valuable, new update page. In the crawl path, we can analyze the path preference of the physical structure of the page and the preference of the URL logical structure. This allows us to look at our website from a search engine perspective.

From the website log analysis We can also get more content, pretty female network Www.tingnv.com limited to the website SEO knowledge and the lack of logging tools can not be further in-depth, welcome to the study of the friend message exchange.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More