On the voting model in Web search ranking

Source: Internet
Author: User
Tags abstract final relative sort ticket snipe

A few days ago, I read a book on the plight of the elections, one of which, from the American electoral system, introduces the shortcomings of the American electoral system and then raises the various improvements, but each has its own problems, and the changes are interesting.

First of all, the American election system, the U.S. presidential election is a "winner take All" way, each state has dozens of or hundreds of of "state votes" depending on its population, and the state is voting for a presidential candidate, the candidate who gets the most votes in a state, gets all the state votes, and then counts all the candidates ' state votes. , the candidate who gets the most "state ticket" wins.

The problem with this system is obvious, for example, if there are only two states, there are 5 people in State A, and 4 in State B, the state ticket is 5 and 4 respectively, if a candidate X wins 3:2 in a state, and another candidate Y wins in State B with 4:0, so apparently candidate Y gets 6 tickets nationwide. and candidate X only has 3 tickets in state A, but with "winner take All", X got all 5 "state tickets" for a week, and y only got 4 "state tickets" for week B, with only 1/3 people in the country supporting X winning the election.

This happened in the 2000 U.S. presidential election, George W. Bush's state vote ahead of Gore, but in the national population to support the number of Gore is greater than the bush, of course, there is another reason for Gore lost to George W. Bush, the press is not the table.

If you put it in the algorithm field, you can see that the problem here is that in order to statistic the results r (the most suitable candidate for the presidency), a feature a (a vote for each populace) was found, but the result R is not characteristic a, but rather a characteristic B (State ticket) deduced from feature a, in the derivation of feature a to feature B, Information has been lost (the percentage of support per continent is different).

"Winner take All" this system of specific historical reasons not to say, interested friends can go to see the original. The most direct solution to this problem is from "winner take all" into direct election, that is, one person, one vote, direct statistics votes, but this will also encounter a series of problems.

Before you talk about that series of questions, abstract the questions you want to solve:

There are n candidates, each voter votes for the N candidates, and in the end, select the most suitable, most responsive, and logical person in the N candidates.

Programme 1: One-vote system, one vote per person, the election of their favorite candidates, the results of the statistics, the most votes for the person elected.

The problem with this is a "snipe trap" that leads to a definition by the author, for example, if there are three candidates for ABC, BC politics is similar, support for B is also more support for C, and vice versa, in the general population, like BC, a majority of people, A's political and BC contrary to the people in support of a minority in the population. The result is that the number of BC votes will be more decentralized, and a won more concentrated votes to win, if the BC 1 people do not participate in the election, the vote will be concentrated in the hands of B or C, so that the majority of voters elected supporters. Another reason for the failure of Al Gore, who had been pressed in front of him, was that he had dispersed some of Al Gore's votes by thinking that he was involved in a similar case with Gore's politics.

A solution that can improve on this issue is called the "two election system".

Option 2: Two the election system, one vote per person, if no more than 50% of the support, will be the highest number of two candidates to take out, and then a round of elections, the votes of the people won.

The French presidential election is such a two election system, but such a method can only improve the "snipe trap", and cannot be completely resolved, the 2002 French presidential election, there was a similar situation, at that time to support the left-wing political views of the people, but in the two election system, the final two is a right and a right-wing. The reason for this is that there were 16 presidential candidates, most of them left-wing dissidents, which led to the extreme fragmentation of the left's votes.

Scheme 3:n election system, one vote per person, if no more than 50% support, then remove the support of the least candidates, another round of voting, if still no more than 50% of the support, then remove the candidate with the fewest votes until someone greater than 50% support.

In 2001, when the IOC decided that Beijing was the host city for the 2008 Olympic Games, it was using such a system that, in the first round of voting, Osaka was eliminated and Beijing won more than half of its support in the second round.

The problem with the N election system is not practical, if it is the Olympic Committee that only hundreds of people vote in the case can also be used, if similar to the previous French presidential election, there are 16 candidates, the country will be the most likely to vote 15 times, the cost is too high.

Programme 4: Immediate check, if a candidate receives more than 50% of the candidates, the candidate is directly victorious, or the lowest candidate is eliminated, and the second candidate for the lowest number of votes is given to the corresponding candidate, if someone obtains more than 50%, Is elected, otherwise the lowest one is eliminated, and the ticket is divided into the highest ranked and not eliminated candidates, so reciprocating.

The Irish presidential election and the London mayoral election are in a similar scheme, there are also problems with this scenario: the electorate is 10, the centrist candidate is the top 3, and the left and right candidates are the first choice for 4, and of course left-wing voters hate the right candidates, and the right voters hate the left-wing candidates, The left-right is acceptable to centrist candidates, whether it's a check or N-election, and centrist candidates will be eliminated in the first round. The centrist candidate is a person acceptable to all, and can best reconcile the contradictions between the various factions, the most harmonious.

The essence of the scheme is that, while each voter can sort the candidates, only the first one is considered in the first round, without considering the electorate's two or three election.

Option 5: Uplink check system, similar to programme 4, except that the first round of elimination is not the least supported, but the most opposed candidate (the candidate with the most final ballot)

Looking at the situation mentioned above, the centrist candidate is not the last choice for anyone, so the first round of elimination is left or right, and in the second round of elections, the centrist candidate can win.

Programme 5 also has the question of programme 5, considering a situation in which only two candidates AB run and 9 voters, of whom 6 prefer A to dislike B, and 3 prefer B to dislike a, whichever way it is, will be a winner. But now there are two more candidates C and D, who like 3 of B, are putting a in the last candidate, the last choice of 6 people who like a, but the BCD each 2 votes, so that in the first round of elections, a because the most of the final vote was eliminated, and through careful construction examples, can make B final election. The win-win relationship between A and B was reversed simply because of the CD election or not.

The actual use of this scenario is not much, only in Athens in 507 before the similar plan, not to let people vote for support, but to vote against the most opposed to the people voted out.

Scenario 6: Multi-tournament, people to the candidate ranking, and then 22 PK between candidates, statistics each ballot to see candidate a in front of candidate B or B before a, so find the most winning candidates to win the election.

Such a problem is likely to lead to a cyclical outcome, such as ABC three candidates, there are 3 people, the vote is Abc,bca,cab, can be seen between AB A win two times,a>b; Between BC B won two times, B>c,ac between the C win two times, c>a, so that constitutes a a>b>c cycle. This is not a bit like the Football League scoring system Ah, if the points are the same, the football game can see the net wins, goals, and so on, but the author did not unfold in this area, but introduced another way: Boda system.

Programme 7: Boda system, the people of the ranking of candidates, if there are n candidates, the first candidate to get N points, the second to n-1 points, and so on, and then statistics of each candidate's total score, get the most points of victory.

Some critics of the Boda system are: Some voters may use this method to cheat (vote "tactical vote"), the candidate who most supports B is b>a>c, but because of the relative A, they prefer B, so in order to pull the B up, they have to pull A down, Their vote became a b>c>a. Boda's response to this criticism is that my system applies only to honest voters.

The author of this book thinks that the "tactical vote" problem of Boda is not so serious, if it is impossible to accurately predict public opinion and accurately control the voting method of the strategy vote, it may be because the force is too strong, not only to pull a down, but let C get the support ticket increase, so that the most support for B of those who "strategic vote" Instead of making their most hated C elected, there was a similar scene on IMDb:

After the movie "Batman 6", fans of the Batman thought the movie was cool, so want to put Batman 6 into the first IMDB, so they crazy to Batman 6 dozen high score, and at the same time, also gave the IMDB first "Godfather" cast low points, resulting in a force too fierce, the Godfather became the third place, The original Second Shawshank Redemption (TSR) became second (the original second was the row after the father, the new second was behind Batman 6, and then, as the enthusiasm of the crazed fans subsided, the rational opinion prevailed, and the Batman 6 score dropped to 10th. And the Godfather is still behind The Shawshank Redemption, long time no return.

Are there any other problems with the Boda system?

The above is only a note on the 14th chapter of the book, and only a discussion of the "multiple candidate single position", which will continue to be discussed in the context of "multiple candidates for multiple positions", that is, according to each person's ranking of candidates to determine the final ranking of candidates.

Go back to the field of search engine, as the strategy changes will give us some inspiration, first look at the previous abstract problems:

There are n candidates, each voter votes for the N candidates, and in the end, select the most suitable, most responsive, and logical person in the N candidates.

It's much like the search engine solves the problem:

There are n pages in the system, there are m features (page quality, page content richness, page hyperlink, text relevance, etc.) on the n pages have different ratings, how to according to these characteristics of the "vote", choose the most suitable for the first page?

From the examples of elections, we can get a few revelations:

1. Design the algorithm, to avoid the "winner takes All" information loss problem.

2. Do not because some features are particularly good, to the top of a page, or because a few features are particularly bad, you throw a Web page.

3. The most appropriate place to put in the first page is not necessarily the best in every feature, but should be able to take into account all the characteristics, the best combination of the one.

4. Search engine users of the search results of the click behavior can be seen as a "vote" on the search results, such a "vote" of the use of information, but also to consider whether the election process will bring the various unreasonable.

The various electoral options mentioned above are only a discussion of the situation of "multiple candidates single posts", while the problems faced by search engines are more akin to the "multi-candidate ranking" situation, namely:

There are n pages in the system, there are m features (page quality, page content richness, page hyperlink, text relevance, etc.) on the n pages have different rating, how to according to these characteristics of the "vote", determine the order of n pages?

The question of "multi-candidate sequencing" is a theory of "impossible Democracy", which is to the effect that "reasonable" democracy should meet 3 conditions:

1. If the electorate thinks a is better than B, the end result should be a better than B.

2. There is no "dictator", that is, there is no such a person, no matter how others sort, the final result of the sort and the sort of the person is consistent

3. Independent factor independence, that is, after the first ballot is completed, a is in front of B, now for a second ballot, if everyone does not change the relative order of A and b in their votes, the final result should also be a in front of B

And through the proof of mathematics, we can draw a conclusion: if some kind of election way satisfies the condition 1 and 3, then must not satisfy 2, also namely must exist "the dictator", this question proof, may refer to this blog: http://roba.rushcj.com/?p=509

According to the "Impossible democracy" theory, and search engine combined to look, it seems that the search engine is difficult to give a reasonable ranking of the page, but search engines and voting seems different, there are two angles can be cracked

1. Considers that the condition 3 is too strong and needs to be weakened.

2. Perhaps in the Web page ranking problem, there really is such an "authoritarian characteristics", the "authoritarian characteristics" from the current point of view, the most suitable should be "user satisfaction", according to the user's satisfaction to sort the page, is the most reasonable sort of web pages. How do you measure "customer satisfaction"? That's what we've been working on.

by Liangaili



Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.