Ranking algorithm based on user voting (II.): reddit__ algorithm

Source: Internet
Author: User
Tags in python

Turn from: http://www.kuqin.com/algorithm/20120307/318639.html

Last time, I introduced the hacker News ranking algorithm. It is characterized by the user can only vote in favour of the vote, but many sites also allow users to vote against. In other words, in addition to praise, you can also give an article for a bad comment.

Reddit is the largest online community in the United States, where each post has an up and down arrow, which expresses "aye" and "objection" respectively. The user clicks to vote, Reddit calculates the newest "hot article List" according to the result of the vote.

How can we combine the pro and negative votes to figure out the most popular articles for some time? If article A has 100 votes in favour, 5 votes against, article B has 1000 votes in favour of, 950 votes against, who should be in front of the list.

The Reddit program is open source and is written in Python language. The code for the ranking algorithm is roughly as follows:

This code takes into account a number of factors such as:

(1) Post's new and old degree T

t = Posting time-December 8, 2005 7:46:43

The unit of T is a second, calculated with a Unix timestamp. It is not difficult to see that once the post is published, T is fixed value, will not change with time, and the new posts, T value larger. As for December 8, 2005, it should be the time when Reddit was established.

(2) The difference between the vote and the negative X

x = Pro Vote-Negative

(3) Polling direction y

  

Y is a symbolic variable that represents the overall view of the article. If the majority of votes, Y is +1, if the majority, Y is-1, if the vote is equal to the negative, Y is 0.

(4) The degree of affirmation of the post Z

  

Z expresses the number of votes in favour of the negative. If the vote is less than or equal to the negative, then Z equals 1.

Combined with the above several variables, the final score of Reddit is calculated as follows:

  

This formula can be divided into two parts to discuss:

A

  

In this section, the higher the number of votes in favour than the negative, the higher the score.

It should be noted that here is a 10-based logarithm, meaning that z=10 can get 1 points, z=100 can get 2 points. In other words, the top 10 voters have the same weight as the last 90 voters (or 900 more), that is, if a post is particularly popular, the more you vote in favour, the less it will affect the score.

When the negative is over or equal to the z=1, so this part equals 0, which means no scoring.

Two

  

This section shows that the larger the T, the higher the score, that is, the new post will score higher than the old posts. It plays the role of automatically pulling down the rank of old posts.

The denominator of 45,000 seconds equals 12.5 hours, which means that the post will be 2 points more than the previous day's post. In conjunction with the previous section, you can get the conclusion that if the previous day's post still wants to keep its original rank the day after, it will have to increase by 100 times times the net approval vote.

The role of Y is used to generate positive and negative points. When the vote exceeds the negative, the score is positive, and when the vote is less than the negative, the score is minus, and when the two are equal, the score is 0. This guarantees that a large number of net pro-vote articles will be in the forefront of the line, and a large number of net votes will be at the end of the article.

Three

One problem with this algorithm is that it is impossible to get to the forefront of controversial articles, which are very close to the pros and the negative votes. Assuming that there are two posts at the same time, article A has 1 votes in favour (the post cast), 0 votes against, article B has 1000 votes in favour, 1000 votes against, then a ranking will be higher than B, which is obviously unreasonable.

The conclusion is that the ranking of Reddit is largely determined by the time of posting, and that the most popular articles will be in the front, and that the general popular articles and controversial articles will not be very forward. This determines that Reddit is a community of popular taste, not a very radical place to show minority ideas.

Resources

* How Reddit ranking algorithms work

Finish

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.