Intermediary transaction http://www.aliyun.com/zixun/aggregation/6858.html ">seo diagnose Taobao guest cloud host technology Hall
"Web spider" Scientific name Spider, also called "Web crawler"! About the Web spider's Overview Here is not much to say today I mainly want to talk about the spider's crawling design ways and methods
We can be divided into 2 kinds:
So what is depth first? What is breadth first? Shanghai SEO (SWJ) below for everyone to explain!
I learn to know shallow only with the common words and reason with you analysis if there are errors please contact me in a timely manner so also please forgive me!
One is depth-first strategy and one is breadth-first strategy! Here we will analyze around these 2 points SWJ very welcome everyone to exchange learning and discussion!
Depth first name is to make web spiders as much as possible in the crawl Web page to the deeper depth of the excavation into the depths of attention!
Also refers to: Web spiders will start from the beginning of the page, a link to track down a link, after processing this line and then into the next Start page, continue to track links!
Below I send a picture everybody to look down: (below this is the simple webpage connection model diagram, which is the beginning of the Spider Index!)
A total of 5 routes for spiders crawling! Pay attention to the depth!
(The following is an optimized web Connection Model Diagram!) is the improved spider depth Crawl strategy map!
Based on the above 2 tables, we can draw the following conclusions:
Figure 1:
Path 1 ==> A--> B--> E--> H
Path 2 ==> A--> B--> E--> I
Path 3 ==> A--> C
Path 4 ==> A--> D--> F--> K--> L
Path 5 ==> A--> D--> G--> K--> L
After optimized
Figure 2: (the picture has helped everyone to mark the direction!)
Path 1 ==> A--> B--> E--> H
Path 2 ==> I
Path 3 ==> C
Path 4 ==> D--> F--> K--> L
Path 5 ==> G
The advantages of deep crawling are:
Web spider program is relatively easy to design when the other I did not find any advantages ... There is the spider of this "go forward" spirit worth learning! ^_^
The disadvantage of the deep crawling is:
Shortcomings, a little bit more Oh! Every crawl on the first floor to the "Spider Home" database access to ask Mister need to climb the next layer? Climb the floor and ask once ... Quote a tall man if a spider, regardless of 3721, is likely to get lost and more likely to crawl to a foreign website. Originally the target is the Chinese website because the IP problem foreign IP made the Chinese station .... It's easy to go to someone else's home. This not only increases the complexity of the system data but also increases the burden of the server I don't think a search company would like to put,... Unless the brain is "show". ^_^
Next we introduce the general use of the breadth first strategy everyone take a cup of coffee to see also tired to write me also tired ... ^^
Breadth first is defined here as the layer crawling
What is Spider-layer crawling?
is a layer of crawling according to the distribution and layout of the layer to index processing and crawl pages! Of course se won't send a spider to go to each layer will send one or more spider spider to crawl content!
(This is the breadth-first strategy (layer crawl) chart)
You see, you know, you don't need to read the following articles. ^ ^
Based on the above table, we can draw the following conclusion roadmap:
Path 1 ==> A
Path 2 ==> B--> C--> D
Path 3 ==> E--> F--> G
Path 4 ==> H--> i--> K
Path 5 ==> L
The advantages of breadth crawling are:
Breadth relative depth is easier to control for data capture! The negative of the server to the corresponding also significantly reduced a lot! The distributed processing of the crawler makes the speed obviously improve! Other think also can think of pull!
The disadvantages of breadth crawling are:
For the time being, we haven't observed any shortcomings, just like div style sheet (layer layout). Do you think there are any drawbacks?
Wouldn't that be a problem for new people? ^ ^
It doesn't matter download this ebook to see <> download address: http://www.seo-sh.cn/zl/seoqita/122.html
Other suggestions please advise and criticize Shanghai Seo618.html "> director swj very welcome you SEO enthusiasts to Exchange learning and explore SEO optimization technology, site planning can also ^_^ contact way to see the home base!
Transferred from Shanghai SEO http://www.seo-sh.cn