Site Content Scoring Model

Source: Internet
Author: User
Keywords We can 0.5 select

We can see that many of the sites now have content ratings, whether it is E-commerce, http://www.aliyun.com/zixun/aggregation/18542.html "> Information sharing or content download." The content of the score is mainly divided into two categories, one is the user's content rating, mainly for the user experience, such as E-commerce site commodity scoring, content sharing site content scoring, this is also the most common scoring mode, and the comprehensive evaluation of the content of the calculation is relatively simple, mostly take all the user rating of the mean value Another way of scoring is the site's own rating of the content, mainly for the user's historical behavior data, such as through the user's access to content to evaluate the popularity of content and so on.

Here to introduce the site content scoring model mainly for the second type of scoring methods, because the score is relatively fixed, 100 points, 10 or 5 points, and users of each content produced by the number of behavior data is different, may be thousands of orders, may be million orders of magnitude, or even million orders of magnitude, How to transform these data into standard scoring system, and make the final score distribution more reasonable, effective, so that the real high-quality content to obtain a higher rating, and recommended to users, is the key to solve the problem here.

Content Scoring Example

To introduce the application of the case before, first to explain the application of the environment and specific requirements: suppose there is a content sharing site, the content of the site needs to be graded, in a 5-point format, that is, each content score can only be 1-5 of these 5 points, the purpose is to show the popularity of each content in the site, Provide reference for users ' choice and reading.

This is the simplest application of content scoring, and it has been very clear that the purpose of the score-the popularity of the content and the final data presentation-is presented in 5-point form. For such a clear data demand, we can select indicators, build models, and ultimately output results.

1, Selection index

Evaluation of the content of the popularity, seemingly quite simple, directly using Content browsing (PV) as an indicator of evaluation is not the line? Indeed, PV is a good choice and the simplest option, but in fact there is a better choice, the number of visits (Visits), access to the number of users (UV), these two indicators can eliminate the same user for a short period of time to refresh the same content, so we may choose to access the number of users UV as an evaluation index.

2, build the scoring model

Starting now is the key content of the article, it is clear that the need to evaluate the content of the popularity of the first to eliminate indicators of measurement units, and the distribution of the value of the range of points within the requirements of the--1-5.

Eliminate unit of measure? You may have thought, yes, or the standardization of data, the method in this article has been used in many places, it is a lot of data analysis and data mining basic steps.

Max normalized score

Max is the most commonly used data normalization method (see data standardization This article description), the processing of data distributed in the [0,1] interval, then as long as the value of 0-1 to convert 1-5 of these 5 points on the line. Very simple, first times 4 so that the data fall in [0,4] between the distribution area, rounding, is not only 0-4 of this 5 points, plus 1 can get the results we want. Let's take a look at an example of the distribution of the content of each score after processing:

According to the distribution of the above points, we can see that the content quantity distribution of each score of Max score is not controllable, generally with the site's popular content and unpopular content directly related to the proportion, so when some of the popular content of the site only accounted for 20% of the content of the site, and these content is unusually high volume of traffic, Occupy 80% of all site traffic, which is what we normally say is in line with the 28 rule. So what's likely to happen is that most of the content scores are focused on 1 points, a small number of concentrated in 5 points, while the middle of the 2, 3, 4 content distribution is very small, in fact, the figure is a bit biased towards this trend, but in fact, many times we expect the content distribution can be biased to normal, that is, most of the content can be distributed in the middle score There is a relatively small amount of data on both ends, so there is another scoring model below:

Z Standardized scoring

If you have a lot of content on your site, you can use Z-standard (detailed description or see data normalization article, not repeated here). The advantage of Z-standardization is the tendency of the data to present a normal distribution (not exactly what we want), and the normalized data tends to the normal distribution of n (0,1), that is, the overall mean value is 0, and the standard deviation is 1. And think of ways to get them to be just 1-5 of these 5 points, when standardized values:

is less than or equal to-1.5 (that is, -1.5σ): 1 points

Greater than-1.5 (that is, -1.5σ) is less than or equal to-0.5 (that is, -0.5σ): 2 points

Greater than-0.5 (i.e. -0.5σ) less than 0.5 (i.e. 0.5σ): 3 min

Less than or equal to 0.5 (that is, 0.5σ) is less than 1.5 (that is, 1.5σ): 4 points

is greater than or equal to 1.5 (that is, 1.5σ): 5 points

If the data is in line with the standard normal distribution, the proportion of each score is roughly 1 and 5 of the content of 7%, 2 and 4 respectively accounted for 23%, 3 of the content accounted for 40%. Let's take a look at the distribution of the results obtained with this method:

Did you see the results you wanted?

3, Output results

Of course, there are many ways to show the content of the score, the following is a number of site scoring screenshots, in fact, are good.

The above is mainly about a single index of the content scoring system, in fact, in many cases the content of the score is the result of a number of indicators, then the content of the score should take into account all of these impact factors, should build the corresponding model to calculate the content of the comprehensive score, do not expand here, and then have the opportunity to introduce.

» This article uses the»in agreement, reprint please specify the Source: the Website Data analysis» "The website content grading model"

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.