Web spiders: Depth first or breadth first

Source: Internet
Author: User
Keywords Web Spider

Intermediary transaction http://www.aliyun.com/zixun/aggregation/6858.html ">seo diagnose Taobao guest cloud host technology Hall

"Web spider" Scientific name Spider, also called "Web crawler"! About the Web spider's Overview Here is not much to say today I mainly want to talk about the spider's crawling design ways and methods

We can be divided into 2 kinds:

So what is depth first? What is breadth first? Shanghai SEO (SWJ) below for everyone to explain!

I learn to know shallow only with the common words and reason with you analysis if there are errors please contact me in a timely manner so also please forgive me!

One is depth-first strategy and one is breadth-first strategy! Here we will analyze around these 2 points SWJ very welcome everyone to exchange learning and discussion!

Depth first name is to make web spiders as much as possible in the crawl Web page to the deeper depth of the excavation into the depths of attention!

Also refers to: Web spiders will start from the beginning of the page, a link to track down a link, after processing this line and then into the next Start page, continue to track links!

Below I send a picture everybody to look down: (below this is the simple webpage connection model diagram, which is the beginning of the Spider Index!)

A total of 5 routes for spiders crawling! Pay attention to the depth!

(The following is an optimized web Connection Model Diagram!) is the improved spider depth Crawl strategy map!

Based on the above 2 tables, we can draw the following conclusions:

Figure 1:

Path 1 ==> A--> B--> E--> H

Path 2 ==> A--> B--> E--> I

Path 3 ==> A--> C

Path 4 ==> A--> D--> F--> K--> L

Path 5 ==> A--> D--> G--> K--> L

After optimized

Figure 2: (the picture has helped everyone to mark the direction!)

Path 1 ==> A--> B--> E--> H

Path 2 ==> I

Path 3 ==> C

Path 4 ==> D--> F--> K--> L

Path 5 ==> G

The advantages of deep crawling are:

Web spider program is relatively easy to design when the other I did not find any advantages ... There is the spider of this "go forward" spirit worth learning! ^_^

The disadvantage of the deep crawling is:

Shortcomings, a little bit more Oh! Every crawl on the first floor to the "Spider Home" database access to ask Mister need to climb the next layer? Climb the floor and ask once ... Quote a tall man if a spider, regardless of 3721, is likely to get lost and more likely to crawl to a foreign website. Originally the target is the Chinese website because the IP problem foreign IP made the Chinese station .... It's easy to go to someone else's home. This not only increases the complexity of the system data but also increases the burden of the server I don't think a search company would like to put,... Unless the brain is "show". ^_^

Next we introduce the general use of the breadth first strategy everyone take a cup of coffee to see also tired to write me also tired ... ^^

Breadth first is defined here as the layer crawling

What is Spider-layer crawling?

is a layer of crawling according to the distribution and layout of the layer to index processing and crawl pages! Of course se won't send a spider to go to each layer will send one or more spider spider to crawl content!

(This is the breadth-first strategy (layer crawl) chart)

You see, you know, you don't need to read the following articles. ^ ^

Based on the above table, we can draw the following conclusion roadmap:

Path 1 ==> A

Path 2 ==> B--> C--> D

Path 3 ==> E--> F--> G

Path 4 ==> H--> i--> K

Path 5 ==> L

The advantages of breadth crawling are:

Breadth relative depth is easier to control for data capture! The negative of the server to the corresponding also significantly reduced a lot! The distributed processing of the crawler makes the speed obviously improve! Other think also can think of pull!

The disadvantages of breadth crawling are:

For the time being, we haven't observed any shortcomings, just like div style sheet (layer layout). Do you think there are any drawbacks?

Wouldn't that be a problem for new people? ^ ^

It doesn't matter download this ebook to see <> download address: http://www.seo-sh.cn/zl/seoqita/122.html

Other suggestions please advise and criticize Shanghai Seo618.html "> director swj very welcome you SEO enthusiasts to Exchange learning and explore SEO optimization technology, site planning can also ^_^ contact way to see the home base!

Transferred from Shanghai SEO http://www.seo-sh.cn

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.