Google search result sorting algorithm-search engine technology

Source: Internet
Author: User
Matt Cutts is a software engineer at Google's quality management department. His job is to grade a website and develop technologies that prevent fake or junk websites from appearing on Google search results.

One of the most frequently asked questions raised by library administrators is: "What kind of results should be at the top of the search list? How should Google choose ?" Now, quality engineer Matt Kaz has introduced the quick start to explain how Google crawls and indexes on the internet and how it grades search results. Matt also gave advice to the school librarians about how to coach students.

Crawling and indexing

Before you browse a webpage that contains Google search results, many things will happen. The first is crawling and indexing on billions of web pages on the world wide web page. This is done by Googlebot, which is responsible for connecting to global network servers to collect files. Crawlers do not actually roam on the internet. Instead, they access the network server and return to a specific webpage. Then, they scan the webpage to create a hyperlink and compile numbers for each webpage. Crawling can collect a large number of files, but these files cannot be directly used for search.

If there is no index, Google servers will have to read the content of each file every time you search for content such as "civil war" (civil war. Therefore, the second step is to create an index, which requires "conversion" to crawl the obtained data. To avoid scanning every word in each file, you need to write some articles on the data to display all the files that contain a specific word. For example, assume that the word "civil" appears on files numbered 3, 8, 22, 56, 68, and 92, the word "war" appears on files numbered 2, 8, 15, 22, 68, and 77.

Once an index is created, files are graded and their relevance is determined. For example, if a person searches by Google and enters "civil war", two things are required to present and evaluate the search results: one is to search for a webpage containing a user's question; second, the location of the matching webpage is scheduled based on relevance. Google has developed an interesting technology to accelerate the first step: instead of storing all indexes on one computer, it uses hundreds of computers to do this. Because tasks are assigned to many computers, the query results are faster.

To describe this process more vividly, you can imagine the indexing of the next 30-page thick book. If a person searches for several pages in an index, it takes at least a few seconds for each search. But what if you allocate each page of the index to different people? Thirty people search for different parts of the index separately, which is much faster than one person alone. Similarly, Google distributes data to various computers so that files can be searched more quickly.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.