How to make website content scoring model

Source: Internet
Author: User
Keywords Website construction content construction
Tags .mall access behavior content content construction content sharing data distribution

We can see that there are many sites currently have content ratings, whether it is e-commerce, information sharing or content download. The content of the score is divided into two categories, one is the user rating the content, mainly for the user experience, such as e-commerce website product rating, content sharing website content rating, which is currently the most common rating model, The content of the comprehensive score calculation is relatively simple, most of all users take the mean score; the other way is to score the content of the site itself, mainly for the user's historical behavior data, such as through the user's access to content Rating the popularity of the content.

The site content scoring model to be introduced here is mainly for the second type of scoring method because the scoring score is relatively fixed, 100 points, 10 points or 5 points, and the behavior of users relative to each content The values ​​of the data vary widely, and may be in the order of magnitude, or tens of thousands, or even millions. How to translate these data into a standard rating system and rationalize and validate the final score distribution so as to make the real High-quality content to obtain a higher score, and recommend to the user, is the key issue to be solved here.

Content rating examples

Before introducing the application of the case, we must first explain the application of the environment and the specific needs: Suppose there is a content sharing site, you need to score the content of the site, in the form of a five-point system to show that each content score only possible Appear 1-5 these five scores, the purpose is to show the popularity of each content in the site, for the user to choose and read reference.

This is one of the simplest applications for content scoring, and the very purpose of scoring, the popularity of content differentiation, and the final presentation of data, has been made very clear as a five-point presentation. For such a clear data needs, we can choose the indicators, build the model, and the final output.

1, select the indicator

Evaluation of popular content, seemingly very simple, direct use of content views (PV) as a rating indicator is not on the list? Indeed, PV is a good choice, but also the simplest option, but in fact there is a better choice Visits, and Visits (UV). These two indicators can exclude the same user from continuously refreshed the same content in a short time, so we may choose to visit the number of users UV as the evaluation index.

2, build scoring model

Now it is the key content of the article. Obviously, to evaluate the popularity of the content, we must first eliminate the unit of measure of the indicator and control the distribution of the score within the required range - 1-5 points.

Eliminate the unit of measurement? Perhaps you have thought of, yes, or data standardization, the method in this article has been used in many places, can be said that many of the basic steps of data analysis and data mining.

Min-Max normalized score

Min-Max is the most commonly used data normalization method (see the data standardization described in this article), the processed data distribution in the [0,1] range, the next as long as the value of 0-1 conversion 1 -5 This five scores on the line. Is very simple, first multiplied by 4 so that the data fall in the [0,4] distribution interval, rounded down, is not only the 0-4 5 points, plus 1 you can get the results we want. Some sites have only about 20% of all the content on the site, and the content is exceptionally high, accounting for 80% of all site visits, which is what we usually say in line with the 28th Rule. So what is likely to happen is that most of the content scores focus on 1 point, a small part of the focus on 5 points, and 2,3,4 points in the middle of the content distribution is very small, in fact, the figure is a little biased in favor of this trend, but in fact a lot When we expect the distribution of content can be biased toward normal, that is, most of the content can be distributed in the middle of the score, both ends of the score content data is relatively small, so there is another scoring model below:

Z standardized score

If you have a lot of content on your site then you can use Z normalization (for a more detailed description, see the data standardization article, which is not repeated here). The benefits of Z normalization are the trends that allow the data to be normally distributed (not exactly what we want), the normalized data tends to be a normal distribution of N (0,1), that is, the overall mean is 0, The standard deviation is 1. Then think of a way to make them just 1-5 these five scores, when the standardized value:

Less than or equal -1.5 (ie -1.5σ): 1 minute

More than -1.5 (ie -1.5σ) is less than or equal to -0.5 (ie -0.5σ): 2 points

Greater than -0.5 (ie -0.5σ) is less than 0.5 (ie 0.5σ): 3 points

Less than or equal to 0.5 (ie 0.5σ) less than 1.5 (ie 1.5σ): 4 minutes

Greater than or equal to 1.5 (ie 1.5σ): 5 minutes

If the data are in accordance with the standard normal distribution, then the ratio of each score is roughly 1% and 5 points were accounted for 7%, 2 points and 4 points respectively accounted for 23%, 3 points accounted for 40%.

3, the output result

Of course, there are many ways to show the content score, both are good.

In fact, in many cases, the scoring of content is the result of the common influence of many indicators. Therefore, the scoring of content should consider all these influencing factors, and the corresponding comprehensive score of the model calculation content should be constructed , Here does not start, then have the opportunity to introduce again.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.