Maoseomao: A deeper understanding of the principles of search engine first lesson

Source: Internet
Author: User

Intermediary transaction SEO diagnosis Taobao guest Cloud host technology Hall

Maoseomao First search engine theory course, mainly introduces search engine crawler (or robot, spider) is how to replace the artificial collection of information.

Search Engine Basics

What is a search engine? Search engine (foreign representative of Google, domestic representative of Baidu), the use of the program on the Internet crawl site information, crawling back to classify the information for access to search engine Internet users and keyword matching information.

For example: you want to buy a mobile phone, you can directly in the search engine input keywords (nokia), in 0.1 seconds, the search engine will return to sell the list of Nokia sites, the site list is a search engine strictly filtered results. Search engine in this keyword (Nokia) rankings, will use their own set of algorithms, this algorithm is the search engine company's top secret files.

How search engines Grab information

Search engine in the early days, because the Internet information is not many, so many search engines included information completely rely on human resources, many editors, every day non-stop access to the Internet's various websites, the site they think is better. But with the advent of the Internet era, internet site explosion, the artificial collection of information is completely impossible, so these search engines have written to grasp the Internet Information program, called Search engine crawler, robot, or spiders.

The following author to the world famous Google search engine as an example, say a reptile is how to replace the artificial collection of information on the site (most search engines are in accordance with this method to collect information).

Update crawler

Google search engine can send n multiple reptiles at the same time, access to the Internet, if the discovery of new information will be placed in their own database, we call this crawler "update crawler." Updated crawler can be based on the Internet URL address, non-stop to "speed of light" crawling, once their own database loaded with more information, they will return to Google provided by a separate database, the information they bring to throw in the inside, and then come out and then collect information.

Because the update crawler itself with limited warehouse capacity (Google update crawler should be 100KB capacity), so many SEO recommendations in the production site, each page control within 100KB. If the page size exceeds 100KB, the remaining web page information, the update crawler can not be taken away at once.

For pages that are not indexed by Google's main index, because the update crawler is a search result with Google's main index, you'll see your site's information quickly appearing in search results, disappearing quickly, and then appearing in Google's main index after a while.

For pages that have been indexed by Google's main index, refresh the crawler after the update of this page, updates on the page will appear in the search results, but in a few days the update of the page will be back to no update until the depth of the crawler crawl, the updated page will be fully displayed.

Deep reptile

The main task of the deep crawler is to access the Web site already existing in the Google Main index, for the overall server update, the current Google depth crawler can almost update every day, so if not pay attention to observation is simply not aware of. But Baidu depth reptile time will be long, probably a week to crawl deep once, so many do Baidu SEO, are looking forward to Baidu depth crawling, because Baidu depth crawling, means that their work on the week will be Baidu affirmation.

Today's review

Updated reptiles are busy on the internet every day, to collect more updated information on the Web site, in the collection of information, due to update the crawler itself with the data storage limit, so the update crawler can not completely take away more than their own data limit page content, which is why many SEO to the Web page to the smallest reason.

Deep crawler access to search engine main index, it means that the site keyword rankings of the big adjustment, only after the depth of the crawler updated search results can be regarded as a basic stable search results.

Next Period preview

Today, we take the Google search engine as an example, explained the search engine's 2 most important crawler (update reptile, deep reptile), hope to be able to help you novice SEO understanding search engine included. The next section I will explain the search engine workflow, I hope you can pay attention to.

This article by Shaanxi Cornerstone Advertising Co., Ltd. original http://www.jishiguanggao.com Copyright, thank you for your cooperation

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.