Google's pagerank is not introduced much. An algorithm that can measure the importance of a webpage is essentially the result of mutual voting on the webpage. Based on this feature, by using sitemap, we can allow search engines to browse as many website content as possible, or increase the PR value of websites by doing more external links to achieve SEO.
Most search engines on the market are using pagerank similar methods. To ensure fairness, they all use machine-only methods to traverse websites through web crawlers, there are some interesting problems:
1. The content of a webpage is great, but because there are too few external links, crawlers may not be able to climb to it under the preset depth threshold, making it a "Dark content" for few people"
2. Some websites may have good search rankings even if the reposted content or low-value content has a high PR value, even if the technology-leading search engine uses semantic networks to identify high-quality content, the effect is still not good enough.
In order to avoid the above problems, introducing user data to judge the importance and quality of webpage content is a research direction. How can this problem be solved?
Hypothesis: browsing behavior is the best way to judge the quality of web pages, which is equivalent to user labeling. In the case of large-scale data, the effect should be better than that of machines.
Principle:
1. Use a browser or other client software, the best firewall or other security software to obtain user browsing logs and upload the logs to the crawler database of the search engine to obtain user browsing data.
2. crawler matches the existing index library, finds the unindexed content, and crawls it
3. Using user logs to vote for a web page, the longer the browsing time, the higher the weight, the rank of the web page is calculated.
Defects:
1. Dependent clients
2. user privacy issues
Avoidance:
1. Proposes cloud anti-virus, cloud defense, and cloud security, allowing users to agree to upload browsing records
2. Secretly upload, encrypt and split the browsing records (other files can also be), and combine and restore the browsing records on the server.
Now, let's give it a loud and profound name: peoplerank.
Finally, I am very serious about technology.
Via I By sluke Lu Weiqing original address: http://luplusplus.com/peoplerank-modle