How to measure the sharing activity of Google Reader users

Source: Internet
Author: User

Zheng Yi 20090918

1. Background Knowledge

Google Reader users can share their subscribedArticleTo obtain the user ID of a user, you can access the user in the following format:

    • Shared items feed:

      • Bytes
    • Shared items homepage:
      • Https://www.google.com/reader/shared/#userid$
    • Profile homepage:
      • Http://www.google.com/profiles/?profile_id$? Hl = en
    • Profile Widget:
      • Http://www.google.com/s2/widgets/ProfileCard? Uid = $ profile_id $

(0919 Note: userid is not a profileid. You can use the htmlCodeProfile ID .)
Users can also express like for an article.

By collecting shared and like behavior data and constructing a simple formula, we can roughly measure the sharing activity of Google Reader users, or gruserrank for short.

Usage of gruserrank:

    • It is a reference indicator. when estimating the user's contribution to the recommendation of a popular article;
    • It can effectively differentiate users and separate active users, users with low sharing quality, and dormant users, which is conducive to optimization.Program;
    • It is also a reference indicator for socialization.

 

2. How to traverse

Xlvector mentioned in Google Reader data collection, because each shared items feed provides a user ID like it. (For more information about the logic, see my article "how to obtain likes operation data of Google Reader? ), Therefore, as long as you start from a group of users' shared item feeds, you can use the breadth-first search to capture the user data of the entire Google Reader. This dataset is rich in content, including time and content information. I believe that we can do a lot of work on the basis of it.

 

3. Formula for Calculating the sharing activity

This idea can be used to calculate gruserrank.

Step 1,

We start from the collection of GR user IDs (which can be basically guaranteed to be Chinese users) of the playlist Sr, scan the shared items feed of each user, according to the regular expression:

<GR: likinguser> ([0-9a-z _!~ * '()-] +) </GR: likinguser>

Obtain the IDs of all likingusers, store them to the global dictionary, and count the number of likes recently performed by each user to ensure uniqueness.

 

Step 2,

In this way, we get a large set of greader user IDs (likingusers for short), because it is mainly for users who perform like operations on Chinese articles, it is also basically a chinese user. Of course, in this collection:

    • It cannot traverse all greader Chinese users;
    • Not everyone discloses their shared items;
    • Only a few people have created a Google profile with their own logo.

Next, we traverse the likingusers set and traverse the likes according to the likes order, that is, the like user is often marked with priority.

Each user must obtain the following values:

    • Have you shared any articles within 30 days?: If no, it indicates that this user is already a dormant user;
    • How many articles have you shared in four days?: Shares;
    • Freshness of the last three articles shared: Freshmeats for short. Subtract a benchmark time value from the posting time of each article (I use the date of the previous four days as the benchmark, for example, Today is July 15, then the benchmark time is September 18), and then take the average value.
    • The titles of the three articles recently shared do not contain Chinese Characters: If none of them contain Chinese characters, it means that this user may not be a Chinese user and can be disable.

 

Step 3,

According to my article "four modes of the social media sorting algorithm", we also need to specify a time base:

Baseseconds: the total number of seconds in a 12.5-hour cycle, which is 45000 seconds.

 

The formula is:
Gruserrank= Log10 (likes * factor A + shares * factor B) + freshmeats/baseseconds

Factor A and factor B should be adjusted by myself. I will take two and three.

 

P.s:

  • Profile Widget:

    • Http://www.google.com/s2/widgets/ProfileCard? Uid = $ userid $

    Use a regular expression in the HTML code to obtain the Avatar address of the user.

     

    4. Summary

    In this way, the Chinese users of Google Reader can be traversed, and the recognition of users with poor sharing capabilities is very effective. From the rank value, it can be basically determined:

  • Rank <0: the user's activity is very low. If rank <-8 is used, he or she can directly ignore his or her behavior, and he or she does not need to subscribe to his/her shared resources through pubsubhubbub;

  • Rank> 0: the user has a certain degree of activity. The larger the rank, the more active the user, and the higher the quality of the shared articles.

     

    The following parameters can be added to the formula:

    1. Frequency of the user's article sharing: The frequency is too high, indicating that the user has no patience to read or has a low taste. The frequency is too low, so he can ignore his/her behavior;

    2. diversity of sources for users to share articles: a single source means that a user's reading has great limitations. It usually shows that he lacks appreciation and curiosity;

    3. Whether the user shares an article is "follower" or "discoverer". If the user shares the article in the first batch, the user may be an expert in the same group. (See five methods for finding experts from the massive data volumes of social media.)

     

     

    Zheng, Beijing, 20090918

  • We also recommend that you read my recent articles:

  • Handheld devices: lazy smart synchronization of Internet music/player 20090917

  • How can I find Weibo images that are being uploaded? 20090907

  • Four modes of social media value-added development: 20090831;

  • The Network trajectory and fragmentation modes of analysts are 20090830;

  • 20090903 of the five techniques for finding experts from the massive volumes of social media data;

  • The four modes of the social media sorting algorithm are 20090905.

  • Contact Us

    The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

    If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

    A Free Trial That Lets You Build Big!

    Start building with 50+ products and up to 12 months usage for Elastic Compute Service

    • Sales Support

      1 on 1 presale consultation

    • After-Sales Support

      24/7 Technical Support 6 Free Tickets per Quarter Faster Response

    • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.