Absrtact: Through the analysis of the Web site log, we can well judge the health of our website, and also can see the spider crawl Records and some of the user's behavior records, so that the data for us is undoubtedly a key to enhance the site, can
Through the analysis of the website log, we can very well judge the health of our website, and can also see spiders crawl records and some of the user's behavior records, so that the data for us is undoubtedly a key to enhance the site, you can always let us see the shortcomings of the site, so as to correct. However, today's small series and you share the main spider crawling behavior, I hope to help you webmaster.
(a) The number of spiders visited, the length of stay and the number of crawls.
The following information can be learned about the data mentioned in the three minor series in the small title:
1. Average number of pages per crawl = Total Crawl/number of visits
2, one-page crawl stop = every stay/each crawl
3, average time per stay = Total stay time/number of visits
Above three points from Baidu Encyclopedia.
From these data we can see the Baidu spider in our website active degree, affinity, as well as the depth of our website content crawl and so on valid data. When our site in the total frequency of visits, the time of the spider stay, as well as the degree of crawling the site is high, from these are can see whether our site is the spider's Favorite. And in our single page of the length of the spider stay can also see our article page whether the spider's Favorite.
Tips: If you want to develop a website for a long time, the small part of the proposal that you regularly sorted out the site Data report, so that the development of our website will also have a very good help.
(ii) the Web site Directory Spiders Crawl statistics.
Through the analysis of the website log, can be very good to see which of our directories compared to the love of spiders, spiders often crawl included, and given the ranking weight status. And we can through these data can find a lot of problems, for we want to highlight the column can be pointed through the internal structure, so as to effectively increase the weight of the column and Spiders crawl degree. For those of us who do not want spiders to crawl the page can also be shielded.
(iii) Web page crawl.
Through the daily log analysis statistics, we can see which site pages are more popular with spiders, and can understand spiders in these pages crawl behavior, such as whether spiders crawl some not included in the value of the page, or repeatedly crawled some pages, We all know that this will affect the weight of other pages of our site transmission. For example, small series of site skin column is usually more than the name column crawl diligent, included healthy degree is also better, so small series now regularly in the Name column of the article added into some pictures, not only beautiful but also effective promotion of the Name column article included, so through analysis, we can Shielding spiders Crawl These worthless pages, but also effectively improve the weight of our other pages to transfer, and learn the spider more favorite page advantages to supplement the other pages of the deficiencies.
(iv) Understand whether spiders visit our pages and access the status code of the page.
Many friends of the site home page snapshots are often not normal, the site published articles are often not included, in the face of such a situation we will think spiders have not to our site crawl. At this time we can also through the site log to see whether there are spider IP records can understand whether spiders crawling our site, so that we judge whether the quality of our site is not included in the cause. And also can see the spider visit our Site page status code, such as say 301, 503, 403, when this happens, we do as soon as possible, lest become the site down the right of hidden dangers.
(five) understand the time period of spider crawl.
Through the daily log analysis, you will find something wonderful, that is the spider will be in a specific time every day crawling crawl in the site is very active, when we understand this situation, we can be a specific time to update the content of the site, this can more effectively let spiders crawl our site content, To achieve a second harvest effect.
Summary: If a website wants to develop for a long time, webmaster must learn to analyze the Web site log, so as to understand the health of the site every day, and found that when the abnormal situation even if correct, so that not only the site has a lot of help, but also effective to prevent the site because of these anomalies caused by the right, K and other phenomena.