Intermediary transaction http://www.aliyun.com/zixun/aggregation/6858.html ">seo diagnose Taobao guest cloud host technology Hall
Web Capture Priority Policy
The Web Capture priority strategy, also known as page Selection, is usually the first page to crawl importance as much as possible to ensure that the pages that are of high importance are taken care of with limited resources. So which pages are important? How to quantify importance?
Importance metrics are determined by the aspect of link popularity, link importance, and average link depth.
Define link popularity as IB (P), which is primarily determined by the number and quality of backlinks (backinks). First look at the number, intuitively speaking, a page has more links to it (the number of backlinks), then the other Web page to its recognition. At the same time, this Web page is a great opportunity to be visited by netizens, and the higher the importance is, the higher the importance is, if the more important the net is directed. If we do not consider the quality, we will have local optimal, not the global optimal problem. The most typical is the cheat Web page, artificially set up a number of links in some pages to their own web pages, to improve the importance of the page. If the link quality is not considered, it will be exploited by these cheaters.
Define the importance of the link to Il (P), which is a function of the URL string, just to examine the string itself. Link importance is mainly through a number of patterns, such as the view that contains ". COM" or "home" URLs of high importance, and with less slash (slash) of the URL is of higher importance.
Defines an average link depth of ID (P), which is created by the author. The ID (P) means that in a collection of seed sites, if a link (width-first traversal rule) is present at each seed site, the average link depth is another important indicator of the page. Because the closer you are to the seed site, the more opportunities you have to be accessed, the farther away from the seed site, the lower the importance. In fact, the need for this highly important web page to be first crawled is met by the breadth-first traversal rule.
Finally, the metric for defining the importance of a Web page is I (P), which is determined linearly by the above two quantitative values, namely:
I (P) =a*ib (P) +β*il (p)
The average link depth is guaranteed by the traversal rule with width precedence, so it is not regarded as the index of importance evaluation. In the case of a limited ability to crawl, if the importance of the Web page as much as possible to catch up, is reasonable and scientific, the end of the user to query the pages are often those of high importance of the page.
Although this seems to be perfect enough, in fact, ignore an important factor-time. Time leads to a dynamic change in the World Wide Web. How do you crawl those new pages? How do I revisit the pages that have been modified? How to find those pages that have been deleted? In order to keep abreast of the changes in the Web pages, there must be a Web revisit strategy. This strategy can be used to identify the changes in the 3 Web pages that add, modify, and delete pages.
First: Chuang billion website planning Organization (http://www.ccyyw.com)