Now online on the search engine optimization of the article a lot, from the side of the site to explain the optimization of the individual also more and more, in many forums, the most lively is the novice question and answer area, the hottest is also very basic optimization knowledge. From this, Zhao just think it is necessary to let everyone understand how the search engine in the end is how to crawl the Web page, this is the fundamental search engine survival, but also the basis for development. Do site optimization and promotion of the time only to seize the core is also the most essential things, we can quo!
In fact, search engine first to crawl to the Web page, and then index and processing, and finally the sort of results provided to users, this is the search engine crawl principle. Today Zhao first to explain how the search engine is to crawl the page!
The search engine first sends out a software called "spider" or "robot" that scans websites that exist on the Internet, along with links from one Web page to another, from one Web site to another, according to certain rules. In order to ensure the latest collection of information, it will also visit the Web page has been crawled.
The process of Web page collection to ensure that each page will not be repeated crawl, because a page may be more than one page link, search engine spiders crawl process may be several times to get the URL of the page, all the effective way to solve this problem is to use two 6184.html "> datasheet respectively, Unvisited_table and visited_table. The former contains URLs that have not yet been accessed, which record the URLs that have been visited. This article first Zhao website promotion blog, if you need to reprint, please retain the relevant copyright!
The system will first collect the seed URL into unvisited_table, and then spider from which to collect the URL of the Web page, the collection of URL into the visited_table, the newly resolved and not in Visited_ The URL in the table is added to the unvisited_table.
The search engine's automatic information collection function is divided into two kinds:
One is a regular search, that is, every once in a while (such as Google is usually 28 days), search engine actively sent "spider" Program, a certain IP address range of Internet stations to retrieve, once found a new site, it will automatically extract the site's information and Web site to join their own database.
Another is to submit web search, that is, the site owner to the search engine to submit the site, it in a certain period of time (2 days to several months) to send a regular "spider" Program to your site, scan your site and the relevant information into the database, in case the user inquiries.
The above is Zhao Search engine Crawl Web page basic understanding, next article Zhao just will for everybody analysis search engine is how to index and processing webpage!
This article first address Zhao Network promotion blog: http://www.cnzg5.com.cn/post/17.html (reprint please keep)
The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion;
products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the
content of the page makes you feel confusing, please write us an email, we will handle the problem
within 5 days after receiving your email.
If you find any instances of plagiarism from the community, please send an email to:
info-contact@alibabacloud.com
and provide relevant evidence. A staff member will contact you within 5 working days.