Poster: Wu Jun, Google Engineer
You may have heard that Google's revolutionary invention is its "Page Rank" Web Page ranking algorithm, which completely solves the problem of sorting search results. In fact, it is not Google that is the first attempt to rank many websites on the Internet. Yahoo! At first, the company used directory classification to allow users to retrieve information over the Internet. However, due to the current computer capacity and speed restrictions, Yahoo! There is a common problem with other search engines in the same age: there are too few webpages to be indexed, and you can only index the actual words related to common content on webpages. At that time, it was difficult for users to find relevant information. I remember searching for a paper before 1999 and changing several search engines. Later, DEC developed the AltaVista Search Engine and used only one ALPHA server. However, it included more webpages than ever before, and indexed each word in the engine. Although AltaVista allows users to search for a large number of results, most of the results are not related to queries. Sometimes, you need to flip several pages to find the desired webpage. Therefore, the initial AltaVista solved the coverage problem to some extent, but the results cannot be sorted well.
What is Google's "Page Rank" (WebPage Ranking? In fact, it is simply a democratic vote. For example, if we were looking for Dr. Lee Kai-Fu, one hundred people raised their hands and said they were Lee Kai-Fu. So who is it true? Maybe there are several real ones, but even so, who are really looking? :-) If everyone says that Google is true, then it is true.
On the Internet, if a Web page is linked by many other web pages, it indicates that it is widely recognized and trusted, then its ranking is high. This is the core idea of Page Rank. Of course, Google's Page Rank Algorithm is actually much more complicated. For example, links from different web pages are treated differently, and the top links on the web pages are more reliable, so these links are given greater weight. This factor is taken into consideration by Page Rank, but now the problem is that the ranking of the web Page itself is required in the calculation of the web Page ranking of the search results. Isn't it a problem of having a chicken or an egg first?
Two founders of Google, Larry Page and Sergey Brin, turned this problem into a two-dimensional matrix multiplication problem, this problem is solved through iteration. They first assume that the ranking of all webpages is the same, and calculate the first iteration ranking of each webpage based on the initial value, and then calculate the second ranking based on the first iteration ranking. The two theoretically prove that, no matter how the initial value is selected, this algorithm ensures that the estimation value of the web page ranking can converge to their actual value. It is worth mentioning that this algorithm has no human intervention at all.
The theoretical problem is solved and practical problems are encountered. Because the number of web pages on the Internet is huge, the two-dimensional matrix mentioned above theoretically has as many elements as the number of web pages. If we assume there are billions of web pages, this matrix has 10 billion million elements. The calculation amount of such a large matrix is very large. Using the sparse matrix computing technique, lari and sergix greatly simplified the calculation workload and implemented this web page ranking algorithm. Today, Google engineers have transplanted this algorithm to parallel Computers, further shortening the computing time and shortening the web page update cycle.
After I came to Google, Larry talked with several of our new employees about how he and Sergey thought of the Web ranking algorithm. "At that time, we felt that the entire Internet was like a big Graph. Every website was like a node, and the link of every webpage was like an arc. I think the Internet can be described using a graph or matrix. I may be able to use this discovery to create a doctorate thesis. "He and sergix invented the Page Rank Algorithm in this way.
The best thing about web page ranking is that it treats the entire Internet as a whole. It unconsciously conforms to the viewpoint of system theory. In contrast, most previous information searches treat each webpage as an independent individual. Many people only pay attention to the relevance between the webpage content and query statements, ignoring the relationship between webpages.
Today, Google's search engine is much more complex and improved than originally. However, web page ranking is still crucial in all Google algorithms. In academic circles, this algorithm has been recognized as one of the greatest contributions of literature search. Many universities have introduced Information Retrieval Courses.