Search Engine Crawl Page principle Learn notes

Source: Internet
Author: User
Keywords Search

Intermediary transaction http://www.aliyun.com/zixun/aggregation/6858.html ">seo diagnose Taobao guest cloud host technology Hall

Do regular SEO technology will be the search Engine Crawl Web page principle should be very complex. Can write a voluminous essay. Here is my own learning of the web crawl principle record. The notes should not be complete, just an introduction to the important process.

First to crawl the work of the page needs to assign a search engine spider. That's the robot. Every search engine has a large number of spiders to send. In a time when this information update is so fast, every spider is not idle. The search engine assigned a large number of URLs to each spider. Make sure everyone's work is not duplicated. Each URL has and only a fixed spider responsible for crawling

When a spider gets a job and comes to the target site, there are two strategies for crawling. Depth first and breadth priority respectively

Depth first is the way to the end. Find a link and crawl to it. Like the red number order of the following figure. came to the computer page. See the first link is a desktop computer into the desktop page. The first link on the page is the Dell desktop and then continues to enter the page. When the Dell Desktop page is gone, the link is returned to the desktop computer page to crawl the second link Lenovo Desktop

Such crawling disadvantage is not in accordance with the important order to include the page. Desktop Laptop Tablet PCs are both important and updated more frequently than Dell desktops and so on. Therefore, we need to give priority to crawling columns.

So breadth first becomes the main strategy that the search engine collects. That came to the computer home after the discovery of three connections, the first two links into the work of the Task table, and then the first is a desktop page included content. And keep a link between the links, such as Dell Desktop and Lenovo desktop addresses, and save them to the work plan. Plan to catch.

Finish working on the desktop and then go to the laptop page. Keep the link in the schedule after you crawl the content. Finally came to the tablet computer page included in the contents of the link into the table

After all the columns have been included in the plan from the list before the address deposited. This is the Dell desktop Lenovo Desktop in turn. HP notebook ASUS notebook and ipad. So

The following figure Blue Numeric order

  

Of course, for most sites, each page has the same end, is used to show the location of columns and links, so a large number of pages and end of the same point to the same link will be spider ignored, this time the link in the body will become its record and crawl the goal

Of course, the included pages are not complete pages. Instead, the contents are extracted to the high quality content after the meaningless words and so on, then add the filtered content to the analysis system to get the article keyword and so on.

You reprint this article I'm flattered, hoping to keep the original address: http://www.ijseo.net/?p=592 as an incentive

, also welcome your comments!

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.