HP social computing lab research report on social media influence and depolarity

Source: Internet
Author: User

Zheng Yi 20100806

In other words, Hewlett-Packard has a social computing Lab (SCL) lab that studies Social Network Data Mining, headed by Dr. Bernardo Huberman.

They recently published a study named influence and passivity in social media, based on 2.5 million tweets of 22 million users. One of its conclusions is: The correlation between popularity and in between uence is weaker than it might be expected,There is no correlation between fame and influence, which is much weaker than people expect., High numbers of followers does not equal influence because those followers do not re-tweet,Fame and influence are two different things. Most people are not influential. What is important is how many people are willing to forward your messages..

The easiest way to measure the relationship between your fame and influence is to publish a short domain name website that can count the number of clicks to see how many people have clicked from your tweet, regardless of the tens of thousands of followers you have, do you have enough influence to make people click a link.

Companies that do PR or ad should pay attention to this and want Weibo users to help you with marketing,Don't just look at his followers/fans. This number is useless and you need to accurately measure his real influence..

HP's research does not specifically target Twitter, so its conclusion also applies to other social networks.

 

Added the passivity dimension.

Most people are only passive recipients of information, and they will not forward things to their networks. In order to make a person influential, it is not only necessary to attract the attention of others (eye) to become famous, but also to let users overcome their passive polarity (passi ).

Using the depolarity of people in the social network, this paper designs an influential universal model. It also developsAlgorithmIt is similar to the HITS algorithm to quantify the influence of all people in the network. It comprehensively considers the structural attributes of the network and the Propagation Behavior between users.

The influence of a user depends not only on the number of listeners it affects, but also on their depolarity.

The previous influence measurement method is mainly based on the statistical attributes of some individuals, such as the number of contributors, such as the number of retweets.

This algorithm has good prediction capabilities, such as predicting the maximum number of clicks on a publishing link.

The SCL also found that most of those highly depolar nodes are spammers or robot users ). (Note: I don't know how to evaluate the @ rtmeme robot .)

 

Implementation

Like the Referer list, the ICP filing system uses the Twitter search API to query the tweets containing HTTP keywords and try to collect the tweets (hereinafter referred to as "link push") of the mentioned links "). After 300 hours, 22 million related messages were received, of which 15 million links were checked as valid formats. According to their evaluation, the 22 million was only 1/15 of all Twitter messages in that period.

Then, the users in the set are queried one by one through the Twitter API, especially the number of followers/followings.

In this way, a URL set with a timestamp is obtained, and a complete social graph corresponding to the user is obtained.

 

User retweeting Rate= User a determines the number of URLs forwarded/The number of URLs that user a receives from his followings (the person he is interested in.

Audience retweeting Rate= Number of URLs forwarded by followers (senders) in the URLs published by user a/The number of URLs received by a follower of user a from user.

 

It is easy to calculate the pairwise influence relationship between users. For example, in Twitter, to calculate the influence of user a on user B, you only need to count the number of times that user B pushes.However, it is difficult for you to use this pairwise influence information to calculate the influence of a user (such as @ zhengyun) on the entire network.

The IP address (influence-passiscore) algorithm is designed in this model. each user has an influence score and a passivity score.A user's passivity score is used to assess how hard others want to influence him.

The algorithm has the following assumptions:

1. A user's influence score depends on the number of people she can influence and the passivity of these people.

2. A user's influence score also takes into account how much attention she can influence ).

3. A user's passistmscore depends on the influence of such people: She can receive messages from these people but is not affected.

4. How much she rejects other user's influence compared to everyone else.

 

Algorithm iteration calculates the score of passiity and influence, a bit like the authorization page of the HITS algorithm and the hub page pointing to them.

Given a weighted directed graph (weighted directed graph) G = (n, E, W), n is the set of all nodes (nodes), and E is the set of arcs, W is the weight. The weight of an arc E = (I, j) wij represents a ratio: The influence of I on J/ITryAll influence on J.

The IP algorithm uses this graph as the input. The following figure shows the SCL:

Nodes are those who have published more than three links.

If user J has pushed at least one link of user I, then arc (I, j) exists. The formula for calculating the weight of this arc is wij = SIJ/QI. Among them, SIJ is the number of user J-pushed in the Link published by user I, qi is the number of user I publish links.

In the final calculation, the number of nodes in the graph of the SCL is 0.45 million, with 1 million arcs and an average weight of 0.07.

Based on the graph data, the PageRank, influence, passivity scores, and Hirsch index are calculated by the SCL.

(Note: H-index is a method for evaluating academic achievements. The H index of a researcher indicates that he has cited at most h papers for at least H times. The H index can accurately reflect a person's academic achievements. The higher a person's H index, the greater his paper's influence. For example, if a person's H index is 20, this indicates that there are 20 papers referenced at least 20 times in each of his published papers .)

In Twitter, a user's H-index is H, which indicates that he has been pushed for at least H times.

(To be continued)

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.