How Google search engine works

Source: Internet
Author: User

Ppcblog.com presents us with a picture carefully illustrated by Jess Bachman (working at wallstats.com, this flowchart shows how the search engine processes the Google search buttons with 0.3 billion clicks per day in less than one second.

This is the latest one I just printed. This flowchart demonstrates the blink of an eye before Google returns the query result after you click the Google search button, how does Google process your search request? This is a killer application of search giant Google's annual profit of up to $20 billion. It is also the leading commercial and technological myth of the Internet. Everyone must want to know the secrets behind Google's money-shaking tree.

1. Google's official description of its search technology

The backend software of our search technology triggers a series of parallel computing tasks that take less than one second on the server side, the search results of traditional search engines before the advent of Google rely heavily on the frequency of keyword appearance on the page, we use more than 200 metric signals (including our patented PageRank page level Weighting Algorithm) used to check the link structure of the World Wide Web (the original idea of page B and Blin was to model the link structure of the World Wide Web with Directed Acyclic graphs in graph theory) and determine the importance of the web page, we assume that the importance of a Web page depends on the reference of other pages. Just like the reference index in academic papers, important papers will always be referenced by many other papers. Then, we perform hypertext match analysis based on the search conditions (reverse keyword index Retrieval for the page content captured by the BOT) to determine the webpage most relevant to the search request. Based on the two most important aspects of the web page and the web page most relevant to the search request, we can sort the query results and present them to our users based on the importance and relevance of the user's search request.

2. Data Center: The Tower Google uses to index the world

Google's data center is highly confidential and we don't know much about it:

1. There are more than 19 data centers in the United States, and the remaining 17 data centers are distributed all over the world outside the United States.

2. Each data center is as large as 0.5 million square feet. It costs about $0.6 billion to build a data center.

3. Google data center is one of the most efficient facilities in the world, and it is also very environmentally friendly with almost no carbon emissions.

4. The data center uses 50 to 100 megawatts of power and is usually built in a convenient place for water due to cooling.

5. Google servers are placed in a group of standard containers with 1160 servers containing the largest houses.

3. handling process:

1. You write a blog, push a microblog on Twitter, update a website, and perform other operations such as adding content to the Web.

2. Google crawlers (an intelligent proxy program used as a search engine component) capture the title, description, keyword, and other content of Your webpage.

(1) Google bots programs travel around the world wide web along the link path. If you do not have an HTTP path to your site, your site will not be indexed.

(2) If you disable Indexing in robots.txt, the Google Bots program will not crawl your webpage.

(3) If there is a nofollow tag on the HTML link to your site, Google bots will not travel through these links to your site.

(4) Google can also find your website through blog software or XML Site Map

(5) The higher the PageRank, the more links the website has to link to your website, the higher the PageRank of your website.

(6) Google crawlers refer to all links not marked as nofollow.

3. Once accessed by Google crawlers, the webpage will be indexed within a few seconds.

(1) webpage content is stored in an inverted index

① The webpage title and link data are stored in an index for breadth-first search.

② The webpage content is stored in another index for long tails, personalization, and deep preference searches with low retrieval Frequencies

(2) When you search by Google, you are not searching for the World Wide Web that is updated from time to time. Instead, you are searching for Google's cache. Google regularly updates its index library, in the competition for real-time Twitter search, Google's index library has a shorter update cycle.

4. Google evaluates the overall PageRank value of domain names and webpages Based on links.

5. Check the webpage to prevent cheating

(1) Google's search quality and anti-spam information Review and Optimization Algorithms

(2) over 10 thousand remote testing users evaluate the quality of search results

(3) Google invites users to report spam information suspected of PageRank fraud

(4) Google received a notification from the Digital Millennium Copyright Act, requesting Google to record piracy

6. After the damage analysis is performed on the page, each page now has a lot of data slices (such as search keywords) used to assist the user in searching.

7. The user sends a search request

(1) Google search quality engineer Patrick Riley: In most Google searches, your search is in a lot of parallel control processes or Google lab's innovative project teams, it can be said that each query request will participate in some Google creative experiments.

8. Google will Use synonyms to match the search results with the same semantics as your search keywords.

9. generate preliminary query results

(1) Maybe the Google declaration can return thousands of infinite query results, but generally only less than 1000 query results are displayed, for the sake of "less, less, more confusing.

(2) perform Localization on the query results, and the local site will first appear in the query results

10. Sort the query result set by authority and PageRank. Duplicate query results are excluded.

(1) Google identifies the keyword advertisement associated with the spot price auction based on the keyword, advertisement type, and user location

(2) keyword advertisements must comply with local laws

① Illegal advertisements of advertising owners will be banned

② If the keyword search traffic is too low or the keyword ad clicks are too low, it will be automatically disabled

③ Out of commercial strategies, customers like Amazon will offer discounts.

(3) ranking of keyword-related advertisements by revenue potential (continuously evaluating the advertising quality after the keyword is auctioned)

(4) For ad owners, the ad content is generally fixed, but sometimes dynamic keywords are used to make keyword ads more relevant to search keywords.

① Some advertisements allow you to add variable ancillary information, such as website links, phone numbers, product links, and addresses.

(5) When an advertisement has a high click rate, it is displayed above the search result list to make it more conspicuous.

(6) Other advertisements are displayed in the corresponding positions in sequence.

11. filter the query results

(1) for common queries (such as search requests sent on the Google homepage ), google adds related topic vertical search results (such as news, shopping, videos, books, maps, etc.) to the returned query results.

(2) PERSONALIZATION: websites visited by users are more on the query result list.

(3) websites that use a large number of anchor points may be deleted from the query results.

(4) clustering of search result sets: If a webpage is referenced by another highly PageRank website, the importance of the webpage will be greatly improved.

(5) trend analysis: Google adds an additional PageRank weight to the new query results for search keywords with a burst of search traffic or a large number of news. (Google has a Google Trends topic page that reflects keyword search traffic)

(6) Multiple webpages under the same domain name will be grouped into one group if they have the same PageRank.

12. Finally, a user-friendly and well-laid organic query result page with clear query results and advertisement is returned to the browser users.

All these steps were completed in less than one second in response. 0.3 billion clicks per day brought Google an annual revenue of more than $20 billion.

Article Source: http://www.coolinfographics.com/blog/2010/6/30/googlegraphic-how-google-works.html)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.