Intermediary transaction http://www.aliyun.com/zixun/aggregation/6858.html ">seo diagnose Taobao guest cloud host technology Hall
Easy to say, web crawler is similar to the offline reading tool you use.
1.url Traversal and record
2. Multi-process VS multithreading
3. Time Update control
4. Depth of climb
5. Reptiles generally do not crawl the other side of the Web page, is generally through a proxy out, this proxy has the function of relieving pressure, because when the other side of the Web page is not updated, as long as the header to get the tag on it, there is no need for all transmission once, can greatly reduce network bandwidth
6. Please take care of robots.txt when you are free.
7. Storage structure.
Web page update frequency seriously affect the search engine spiders crawl to the Web site, the more the number of crawls means that the probability of the Web page will be more and more, included in the number of SEO is the most basic link.
Try to keep the site in the level three directory, deep Web pages will bring great pressure on the search engine, of course, I think Google has enough servers to bear these pressures, but from the side, the 3-level directory of the pages are crawled and updated the frequency is much lower. Before, I said, to find ways to make the site physical structure and logical structure, which is reflected in the good design of the URL, now you can check the next generation of static Web page of the actual directory has several layers, consider whether it can be optimized.
This article reproduced from: duocaigu.com, reproduced please retain the source, please respect the copyright!