Analysis of spider design in search engine

Source: Internet
Author: User

Intermediary transaction http://www.aliyun.com/zixun/aggregation/6858.html ">seo diagnose Taobao guest cloud host technology Hall

Easy to say, web crawler is similar to the offline reading tool you use.

1.url Traversal and record

2. Multi-process VS multithreading

3. Time Update control

4. Depth of climb

5. Reptiles generally do not crawl the other side of the Web page, is generally through a proxy out, this proxy has the function of relieving pressure, because when the other side of the Web page is not updated, as long as the header to get the tag on it, there is no need for all transmission once, can greatly reduce network bandwidth

6. Please take care of robots.txt when you are free.

7. Storage structure.

Web page update frequency seriously affect the search engine spiders crawl to the Web site, the more the number of crawls means that the probability of the Web page will be more and more, included in the number of SEO is the most basic link.

Try to keep the site in the level three directory, deep Web pages will bring great pressure on the search engine, of course, I think Google has enough servers to bear these pressures, but from the side, the 3-level directory of the pages are crawled and updated the frequency is much lower. Before, I said, to find ways to make the site physical structure and logical structure, which is reflected in the good design of the URL, now you can check the next generation of static Web page of the actual directory has several layers, consider whether it can be optimized.

This article reproduced from: duocaigu.com, reproduced please retain the source, please respect the copyright!

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.