Read a lot of SEO reference books always feel that these books on the principle of search engine is too general, not very good understanding, today I spend a day to clear the principle of search engine, if there is a problem, hope you SEO master to be corrections, younger brother is grateful.
Search engine crawler, spider, what is a robot?-Search Engine Collection principle
Search engines to make their databases strong enough, enough comprehensive, day and night on the network to find new, more reliable information, but with the advent of the network, the network explosion-type of the whole, artificial impossible to complete such a task, therefore, search engine owners developed a set of procedures, Used to grasp the information day and night, and then collation, classification information, and finally indexed the information to their own database.
This is the name of the program that crawls the website Information Day and night, for example: spiders, reptiles, robots, detectors. General search engine can send n more than one crawler program at the same time, they have a page through the URL of a page to grab the title of the site, description, pictures, website content, etc., and then put the grab back to the information in the dedicated warehouse, waiting for the index.
But the site's designer, and can not guarantee their design site is seamless, there will be a lot of problems: such as a Web page in the dead link, the content of the Web page is too much, all causes the crawler can not correctly crawl the content of the entire page, perhaps crawler only grabbed the head of this page, When grasping the body of the time to find their own storage information is not enough, have to leave. So we should pay attention to these problems when designing the website, suggest that the website designer can make the web design easy for the crawler to accept.
Introduction to Google's two crawler program principles
Here we take the search engine to do the best Google as an example, analysis of how the search engine is crawling information, how to process information.
Google Crawler has 2 kinds: refreshing the crawler, and the depth of the crawler, refreshing the reptile part of the day and night to put the crawled information in a specific database, because the Refresh crawler and the main indexing program together to provide search results, and sometimes you will find that your page updates suddenly appear on the search results page, But after a while it suddenly disappeared, this is because the crawler is constantly grasping the information, and constantly rewrite, give me the feeling is refreshing the storage mechanism of the crawler more like the stack in the data structure, advanced, after the first out, disappeared this time you SEO do not worry, Or that kind of constant updates will appear in the search results after January, but it may not wait that long now. If your page has already appeared in the search engine's index, the crawler will quickly display your updates as soon as it finds your updates, but it is still not stable enough to keep your page stable until the depth crawler updates the primary index.
Below we use a simple process to introduce the process of search engine collection:
Refreshing crawler Program--------"Discovery Information----------" Crawl information----------"placed in the private database-----------" Waiting for index collation----------Index collation (deep crawler access to main index)-------index completed, A keyword ranking has been calculated-------"waiting for users to search--------," the results.
How many results does the search engine provide?
Search results provide two kinds of search results, suggested that all seoer should do, I am learning, I hope to get expert advice Pleaes.
Three search results 1 content index Results 2 Special index results, the former is on the Web page keywords, title, description, link source text and other text form index and compression. The latter contains a picture index, PDF file index and other special index, suggest you seoer don't underestimate the second search results, this also can bring considerable traffic.
Summary: The principle of search engine is basically these, if there is any problem please corrections, younger brother change, hehe.