To be a social recommendation engine.

Source: Internet
Author: User

What is social recommendation (SR )?

For the time being, we have defined the following types of social media shared by it groups:

Micro-blog; (RSS) Reader sharing; online collection.

Memetracker for social media includes rssmeme for googlereader (including feedzshare made by readburner and kuber in China ), for Twitter's tweetmeme and twemes (twemes requires the # symbol to form Topic aggregation), friendfeed can even automatically summarize various sharing links imported by lifestreams within the range of friends. Shows how friendfeed can combine different types of sharing sources:

Friendfeed is used to merge links (tinyurl can be processed) within the scope of your subscription. rssmeme and tweetmeme are used to aggregate links for a single site without distinguishing users, play SD and techmeme are used to aggregate social conversations (including link computing and text join computing) based on selected high-quality information sources ).

Then, we can not combine these three types to derive a new service:Social recommendationWhat about it? To stimulate this idea, we should read the pongba discussion "an integrated reading and Sharing Solution" in December 5, which is to find a solution to share valuable information, it can have a certain degree of divergence of vision, reduce information explosion as much as possible, and take into account the authority of collectors.

SoSocial recommendationIt can be defined:

Select social media sharing sources for a group of IT industry members, such as Cao zenghui, Feng dahui's googlereader, white crow, sleepy animal Twitter, Zhang Liang's meal, and jiewei de and delicious. The sharing links of these information sources are summarized. If one information source is recommended for one vote, multiple factors such as the number of votes, weight of the information source, recommendation time point, and information source type are combined, finally, a cross-platform social recommendation engine like xiaguo popular text, Digg, or Reddit is formed, and semantic association technology is further introduced to evolve to an automated system of collaborative filtering + Semantic Filtering.

After all, this is easy for technicians to implement. After two weekends, I finally put the test version on it. Click "play together Sr". We recommend that you use Firefox for browsing.

Calculation Formula of SR rank

There must be a rank value as the sort basis for the many popular links shared. We define it as Sr rank.

Guwendong disclosed Reddit's social media Algorithm in December 17. Based on its formula, we have summarized the formula of playpoly Sr, as shown below:

A is an articleArticleThe specific release time of, accurate to seconds; B is a fixed time constant, 00:00:00. The number of seconds between A and B is calculated.
Ts.

TS = A-B

M indicates the number of micro-blog recommendations for an article, r indicates the number of shares shared by the reader, and D indicates the number of online collections. The variable Z can be calculated by introducing different weighting factors:

Z = m * 3 + R * 1 + D * 0.8

Finally, the srrank formula is defined,

Srrank = log10z + TS/45000

The parameter annotation is basically the same as that of Reddit. The difference is that there is no negative vote:

1) Time Point B, 00:00:00, is a fixed value. TS reflects the freshness of the article. Introduce B
Is a very elegant technique that allows the freshness measurement to be independent of the current time of the system.

2) 45000 represents the total number of seconds in a 12.5-hour cycle. It works with TS
In combination, over time, the scores of new articles will gradually surpass those of old articles with high voting numbers, so as to achieve automatic updates.
3) log10
The use of this technique is another technique that enables early voting to gain greater weight. For example, the weight obtained from the first 10 votes is equivalent to the weight obtained from 11 to 101.
The weights of tickets are the same.

How to select information sources:


Select a dozen or dozens of it, Internet, and design opinion leaders from microblogs or friendfeed, and then traverse their friends list to get a large number of active social media users. Statistically speaking, this group of people is close to the interest of IT technology. (This is also possible if you are not a fan or a hacker .)

(So you can see the role of GFC. At least she can let you traverse all active internet IDs from opinion leaders .)

For greader, there is no such friend relationship class traversal. It can only be retrieved from feedzshare.

After a large number of social media users are automatically collected, which users need to be filtered out:

1. No new actions in the last two months;

2. Published in English. (most of the shared pages are not Chinese. Many opinion leaders in friends all have a large number of top blogger abroad, except that they must .)

3. Never share a link.

4. Set this person has protected their updates ).

5. share only your own blog or website links.

6. Official Twitter, such as bbcchinese.

7. The action of sharing links is actually inspired by other readers or friendfeed.

8. Basically, the discussion objects do not involve Twitter in Chinese traditional or Japanese.

9. It is basically not mentioned.

10. Always make sensitive remarks. No, GFW has powerful semantics.

11. In principle, the same person (real person) only keeps one source of information. For example, if he crawls the Twitter and jiwai accounts at the same time, he will disable one of them and try to avoid repeated ticketing.

Are there many rules? These are all manual queries and rank scores. The workload is huge. However, just as with Playmates, SD began to collect and review high-quality blog sources, it was a great deal of work at the beginning, and then it was easy. Most of the work was done by machines.

Services whose URLs are shortened must be processed

Many of the links shared by Twitter and other miniblog files are URLs of The Link Service, which must be matched using the following regular expression:

Metaurlpattern =
Re. compile ("(feedproxy \. google \. com | item \. feedsky \. com | tinyurl \. com | snurl \. com | FF \. im | bit \. ly | tr \. im | zuosa \. net )",

Re. ignorecase );


There are many patterns used by netizens, so only one processing method can be found: D. For these shortened Link Services, you need to send the header command to detect which link to jump to 302, because you must count the real sharing address.

The data source must have a wide range of sources or be prone to distortion.

Most of the most popular text lists by default are in ideal sorting, and a few of them are odd. It is estimated that the higher the weight of the microblog, the higher the srrank value.

I used userrank for micro-blog and other information sources in advance. This is the weight of a recommender and has not been considered in the formula. If this factor is taken into consideration, the Opinion Leader's micro-blog recommendation will greatly increase the score. Therefore, you must add more monitoring user sources. Otherwise, if the number of recommendations is small, the amplification factor is too large to be distorted.

So much. If you are interested in social recommendation, you may wish to go to the beta version of the SR.


I will wait for you on the other side.

Zheng @ playmates Sr 20081220

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.