Baidu Google and other large Web sites included in the submission of portal address

Source: Internet
Author: User

  Search engine

Search engine refers to a certain strategy, the use of specific computer programs to collect information from the Internet, in the organization and processing of information, to provide users with retrieval services, users to retrieve relevant information to display to the user's system. Search engines include Full-text indexing, directory indexing, meta search engines, vertical search engines, collection search engines, portal search engines and free link lists. Baidu and Google are the representatives of search engines.

  Working principle

The first step: crawling

Search engine is a specific pattern of software tracking links to the page, from one link to another link, like spiders crawling in the spider web, so called "Spider" is also known as "robot." Search engine spider crawling is entered a certain rule, it needs to comply with some of the command or file content.

  Step Two: Crawl storage

Search engines crawl through the spider tracking links to the Web page, and will crawl the data into the original page database. The page data is exactly the same as the HTML that the user's browser gets. Search engine spiders in the crawl page, but also do a certain amount of duplicate content detection, once the weight of a very low site has a large number of plagiarism, acquisition or duplication of content, it is likely to no longer crawl.

Step Three: Pretreatment

The search engine will crawl the spider back to the page and perform various steps of preprocessing.

⒈ Extract Text

⒉ Chinese participle

⒊ to stop the word

⒋ eliminate noise (search engines need to identify and eliminate these noises, such as copyright notice text, navigation bars, advertising, etc.)

5. Forward Index

6. Inverted index

7. Link Relationship Calculation

8. Special document Processing

In addition to HTML files, search engines are usually able to crawl and index text based on a variety of file types, such as PDF, Word, WPS, XLS, PPT, TXT file and so on. We often see these file types in search results. But search engines can not deal with pictures, videos, Flash, such as non-text content, and can not execute scripts and programs.

 Fourth step: Ranking

After the user enters the keyword in the search box, the ranking program calls the index database data, calculates the rank display to the user, the ranking process and the user direct interaction. However, as a result of the large number of search engine data, although to achieve a daily there are small updates, but the general situation search engine ranking rules are based on the day, week, month different amplitude of the update.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.