What's popular's cross-validation Mode

Source: Internet
Author: User

Zheng Yi 20090919

There are many machine intelligence systems that utilize the wisdom of the masses.What's popularFor example:

    • Reading:

      • Techmeme
      • Tweetmeme
      • Stumbleupon
      • Igoogle what's popular gadget
      • Delicious popular bookmarks
      • Playgroup SR
    • Event field:
      • ?
    • Trends:
      • Search engine keyword popularity
      • Google Trends: hot trends (in days)
      • Twitter's own Trending Topics sidebar and many third-party applications, such as twopular
      • Play together RT and play together PP
    • Toutiao (headline) field:
      • Popurls
      • Alltop

 

1. single measurement dimension

Zheng believes that most machine intelligence standards are measured based on the internal data of a single system.

MyArticleDigg sorting mentioned in four modes of social media Sorting AlgorithmAlgorithmIt has many gorgeous rules, But no matter the voting speed, the voting user level, the number of comments and scores, the number of bury, the user's popular ratio and other parameters, data in the Digg system.

Tweetmeme has the ability to efficiently track Twitter's hot links in the world, but its evaluation system is still centered around Twitter's data conversion, although considerate link classification (content classification is technology, entertainment, games, etc.; Link classification is news, pictures, and video) is the calculation result of external data.

Rssmeme is only based on the number of times Google Reader shared items is shared. Stumbleupon and delicious popular bookmarks are based on the number of favorites in their systems.

These large enough systems can fully reflect the maturity of the Internet.What's popular.

However, in China, there are not enough social media users, such as microblogs and RSS readers. Even if there are enough people, they tend to converge to a specific temperament group. For example, the user temperament of the maopu, TianyaCommunityUser temperament, the internal network of the former school, the user temperament of kaixin.com, And the temperament of Chinese Twitter users.

This raises the following question:

 

2. There is little or insufficient domestic data to answer what's popular

I would like to emphasize that there are still some exceptions, such as Baidu Post Bar data. If you have a good digging, it will fully reflect China's hot trends and what's popular.

However, the data of social media is often due to inherent problems in China, so that users have to find one or two official and unofficial websites) linglong can get enough information and books, while users in North America can only rely on Digg, and technology users only need to watch techmeme.

For example, if we imitate rssmeme or tweetmeme for applications, it is clear that data has a serious preference, not technology (and only a part of technology information), but also politics (and also tend to be strong ).Death), Which is rare for the masses of people.

3. Add other measurement dimensions for cross Verification

(The following extension focuses more on the readings filter in the memetracker direction. It is not a general solution .)

To make up for this problem, we need to consider various internal data of different social media as one measurement dimension and then perform cross-validation. Or"Data Mining mode based on mashup".

A very simple example:

Based on my article "how to measure the sharing activity of Google Reader users", I traverse Chinese users of Google Reader and calculate their rank value to get the sharing weight from RSS reader, A group of reliable reader users (Reader A-list) are obtained after a small number of users with low quality and low rank are discarded ).

According to my book "Come, make a social recommendation engine", traverse the followers/followings relationship among core Weibo users such as Twitter, and calculate the rank of Weibo users in Chinese, and establish a small part of the micro blog users as high-quality and reliable users (miniblog A-list for short ).

Scan Twitter to find links shared by Chinese users, or srcbacks links for short.

The data of other social media will also be included in the recommendation source, but it will not be elaborated because there is not much data.

With this data, we hope that the SR homepage (currently being maintained) can reflect the Chinese worldWhat's popularWhether it's worth reading readings, pop-up jokes, or time-sensitive breaking news.

However, due to the timeliness of the reader, although the microblog is fast enough, it is too easy to send links, so the links with low quality are rampant. to combine these factors, we can perform the following cross-validation:

    • The recommendation of reader A-list and miniblog A-list is counted as one vote with the same weight. For more information, see the-list user's rank.
    • If you have enough votes, you can go to the SR homepage.
    • The number of votes from A-list is insufficient. In this case, if the srcbacks links data contains enough micro-blog links, you can also push it to the homepage. Spammer is easily caused by srcbacks links, so a-list must be used to ensure quality.

 

In this way, this information filter not only introduces the real-time performance of microblogs, but also suppresses its excessive proliferation. It also refers to the social media user activity and popularity indicators in the original system, it can effectively discard low-quality social media users, and the algorithms are not complex, so as to achieve a better filtering effect and efficiency.

 

Zheng Yi 20090919 Beijing Report

 

We recommend that you read my related articles:

  • How to measure the sharing activity of Google Reader users by 20090919

  • How can I find Weibo images that are being uploaded? 20090907

  • Four modes of social media value-added development: 20090831;

  • The Network trajectory and fragmentation modes of analysts are 20090830;

  • 20090903 of the five techniques for finding experts from the massive volumes of social media data;

  • The four modes of the social media sorting algorithm are 20090905.

  • Contact Us

    The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

    If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

    A Free Trial That Lets You Build Big!

    Start building with 50+ products and up to 12 months usage for Elastic Compute Service

    • Sales Support

      1 on 1 presale consultation

    • After-Sales Support

      24/7 Technical Support 6 Free Tickets per Quarter Faster Response

    • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.