Source: Internet
Author: User

Keywords
nbsp;
algorithm
step
can
for

&http://www.aliyun.com/zixun/aggregation/37954.html ">nbsp; In the micro-blogging environment, how to automatically tap a microblogging user's social circle or interest circle, is a very basic and important problem. If you can accurately tap a user's social relationship on Weibo, for many specific applications have a good role, such as to better the user's interest in mining or to recommend users have not been concerned about social circle members, or according to their social circle more accurately personalized user modeling, Provide basic services for other recommendation or ad push based on user personalization model.

We have proposed the hiphop algorithm in the microblogging related research and development task, aiming at automatically excavating the user's different social circles by using the interaction behavior of the microblogging users. At the beginning of the design algorithm, we hope that the circle mining algorithm can meet the following conditions:

1. For a microblog user A, you can tap into a variety of social circles, such as a colleague's relationship circle and a professional circle of interest.

2. At the same time, for another user B, may belong to user A's different social circles, such as B is a university classmate, but also a company colleague, then B should appear in user A's two different circles of interest.

3. Without the use of user privacy data, and for the purpose of protecting user privacy, we want the algorithm to use only the user's public behavior and information, so the hiphop algorithm uses only the public information that is fully visible to the interaction.

4. Social circles can be explained by describing the nature or characteristics of a social circle in a concise way, and now by making a distinction between each circle by playing a different label.

Hiphop social Circle mining algorithm is designed and developed in the above guidelines, it can meet the above constraints, the current public references rarely see the relevant social circle mining algorithm can meet these conditions.

Common algorithms for social circle mining

Social Circle Mining is a very typical and popular research task in current social networking research, often referred to as "community discovery". The academic community has put forward many algorithms to solve this problem, in general, it can be divided into two categories: "Single Community" method and "multi-community" approach. The so-called "single community" approach, which means that a node in the network structure can only be subordinate to a certain community, do not allow the emergence of a number of community-specific phenomenon. The "multi-community" approach allows users to belong to multiple communities at the same time. The following is a brief introduction to the idea of the GN algorithm and the "maximal group structure" as the representative of these two kinds of algorithms.

GN algorithm

The GN algorithm is a very commonly used graph structure in the Community automatic discovery algorithm, originally by Girvan and Newman in 2002, because its effectiveness has been widely used.

The basic idea of the GN algorithm is: In the diagram structure, you first compute the "number" of each edge, and then remove the largest edge of the "number" from the diagram, so that the loop continues, iterating over the maximum edge of the current "number", and eventually forming the discovered community. The so-called "interface number" refers to the number of times this edge is passed through the shortest path of any two nodes in the graph. The greater the number of "interfaces" on the edge, the greater the probability that the edge is connected to an extra edge of two or more communities or circles, so that the separation of the community can be achieved by constantly removing the high "number" edges.

The GN algorithm is an effective algorithm, but this is a "single community" discovery method, that is, for a node in the graph can only belong to a fixed community, it is not possible to belong to multiple communities at the same time, this and the actual application scenario needs are significantly different, forming the limitations of the algorithm.

"Max Group structure" algorithm

Max clique is a popular algorithm for "multi-Community" discovery, in which the nodes in the graph can be subordinate to many different communities.

By analyzing the topological structure of the graph, the "maximal group structure" is found to satisfy the "maximal group" of the graph structure, that is, the largest total unicom sub-graph, each "largest group" is a discovered community.

Although the maximum group structure algorithm can find that a node belongs to multiple communities, there are more practical and application scenarios than the single community discovery method, but the algorithm has its limitations: because the "maximum structure" requirement is a full Unicom sub graph, that is, any two nodes in the child graph have edge connections, which is a very strong constraint. Real-world graphs with such strong constraints tend to have very little or little structure, which causes many of the nodes in the graph to be unable to fit into a community.

The hiphop algorithm takes the idea of "maximal group structure" in a certain step, but relaxes the constraint by means of technology, and improves its effect effectively.

Using hiphop algorithm to discover social circles in Weibo

The hiphop algorithm utilizes the interactive relationships of microblogging users to automatically tap into the different social circles of a user. Here the "interaction" is a general term, the specific interactive content includes: Forwarding microblogging, commenting on Weibo and @ other users, and so on, if User A and User B have any of the above mentioned behaviors, you can think that there is an interaction between the two, and according to their frequency can give the edge of different intensity, representing two users of social intimacy.

The reason why we use social relationships to excavate social circles is based on the basic assumption that there are different groups of people who interact with a microblog user, and that there is a close interaction between members of a small group, and that there is less interaction among members of different groups. For example, there is a lot of interaction between your college classmates on Weibo, but there is little interaction between them and your co-workers (see Figure 1). Although this is just a hypothesis, the actual mining effect shows that this assumption is true in most cases.

The hiphop algorithm's technical process can be divided into sequential three steps:

Step one: Find the "largest group structure" from other users who have direct interaction with the user

First, for a Weibo user A, all users who have direct interaction with User A on Weibo form a direct interaction set S. This step attempts to find more than one "maximum group structure" in the set S, which is the core member of several small groups.

For the nodes in set S, a graph G can be constructed based on the interaction between them, and the "maximal group structure" in the graph G is dug up. The so-called "regiment structure", is the figure G contained in any fully connected subgraph, such as the three nodes in Figure G {A,b,c}, if any two of them have an interactive relationship exists, then formed a three-node "regiment structure." The so-called "maximum mass structure" means that for a "group structure" T, it is impossible to find any other node n in Figure G, and if n is incorporated into T, a larger "group structure" is formed. such as the three-node structure above, if node D exists, this node and a, B and C have interactive relationship, then {A,B,C,D} formed a four-node "regiment structure", and if the node can not find the interaction with {A,b,c}, then {a,b,c} is a three-node " Maximum Regiment structure ".

The "structure" of the graph is a very strong constraint because it requires that any two nodes in the graph have an interactive relationship. The physical meaning of the "maximum group structure" of a user A, identified by step one, is: Among those users who have a close relationship with User A, there are small groups that are closely linked.

Step two: The expansion of the "maximum group structure" in the direct interactive user set

Step one finds the "largest group structure" formed in set S with User A having direct interaction behavior, step two on this basis, the "Maximum group structure" of each discovery is expanded within the set S range to find more users belonging to a "maximum group structure". The specific expansion mode is as follows:

For a specific "maximum group structure" T, it contains a number of users, first of all to find and T in the user has interacted with, but also in the set S of other users, we short for this set for U. For a user in U W, we need to determine whether it should be expanded into the "largest group structure" T, the current criteria for the determination of the following formula:

Assuming that G is a new graph formed by the maximum group T fusion of the user W, the molecular portion of the formula represents the sum of the weights of all the inner edges of all the nodes in the new graph G, while the denominator part represents the sum of all the side weights of all the nodes in graph G and any nodes other than If the utility (G) function is larger than the utility function utility (t) of the original structure T of the not-expanded node W, then we think it is reasonable to extend the node W expansion into T, otherwise we should not extend the node w into the diagram T. With this function as the standard, we know which of the users in the set U should expand into the structure T, and which should be discarded.

The use of the above formula as a criterion is based on the assumption that a social circle members interact closely with each other, and that the interaction between members of the circle and those outside the circle is not very close. The above formula is the embodiment of this basic assumption, and the molecular part is to measure the close degree of the relationship within the circle members, while the denominator measures the relationship between the circle members and the members outside the circle. As can be seen from the formula, if there is more interaction between members of the circle, and less interaction with members outside the circle, the greater the utility function, which means the closer the circle.

If the above formula is used for all subsequent extensions in set U to make a decision whether to extend the user to the "Maximum structure" T, then a round expansion of T is completed, and the expanded new set T is formed. For T ', this expansion method can still be extended. The termination condition of the "maximum mass structure" T expansion is: If the decision is not extended for all users in the set U, then the extended boundary is reached, and the expansion can be stopped to form the final expansion result.

If all of the "maximum clusters" found in step one are expanded in this way, the task of step two is completed. As can be seen from the above process, step two is the expansion phase of step one.

Step three: Extensions in other user collections that have a "level two interaction" relationship with the user

The so-called User A "level two interaction" user set, refers to the user A has a direct interaction with users to form a set of S, and the set S in any one user has interactive behavior of all other users formed a level two interaction set.

For the results of step two, the expansion of the "maximum group structure" was completed, and different social circles were found in the direct interactive user collection. Step three first expands the direct interactive user set S into a two-level interactive user collection. It then continues to expand outward with a similar approach to step two, thus forming the final result of the hiphop algorithm, which forms a number of different social circles for user A, while any other user B may belong to a plurality of social circles of User A.

Through the above three steps, we can automatically excavate the social circle of a certain user through the micro-blog interaction. For a large amount of users of Weibo, as long as each user to take the above steps, you can get the final results, this could take large-scale parallel computing to quickly achieve.

Here we use a concrete example to illustrate the hiphop algorithm. Take "Lee Kai-fu" as an example to illustrate the above steps and their intermediate output results.

For step one, first find the interaction with "Kai-Fu Lee," the members of the microblog formed set S, then in the set S to discover the "largest group structure" method, you can get the original 5 "Maximum group structure":

Max Regiment 1 (Innovation Workshop): Wang Huihui/Cai/Zhou Yuan/Zhang/Lei Ryan

Max Regiment 2 (Internet media related): Keso has been xx/Nurichong/Jinlei

Max Group 3 (Financial and investment related): Xiaoping/Patriot Feng/Pan Shiyi/Yang Lan

Max Regiment 4 (Innovation Workshop): Long Chunhui/Rochuan/Shangcong iw/application Sinks

Max Regiment 5 (entrepreneur related): Chao/Jason/Wu bruno/Xipei

In step two, the original 5 largest groups are expanded in the set S, each of the original largest groups has varying degrees of expansion, and its new expansion into the membership range of 3-10.

Step three first expands the direct interacting member set S into a two-level interactive member set, which is about to form a new larger range of microblogging users who have interactive behavior with members of the set S. With the expanded approach described above, the 5 initial "maximum group structure" has been further expanded, resulting in the formation of 48 to 150 members of different social circles.

Through the artificial evaluation, the social circle of hiphop algorithm has strong social cohesion, and it also satisfies the constraints of the initial set of algorithm design, so it has strong practicability. At the same time, through a large number of examples of analysis, we found that the social relationship formed in micro-blogging and the social relationship of the formation of IM are quite different, most of the users of the microblogging in the social relationship with the relationship between colleagues and interests mainly, and the formation of the social relationship in the IM with friends and relatives, colleagues, students and other offline relationships, This may reflect the difference between social media and traditional networks.

/* Copyright NOTICE: You can reprint, reprint, please be sure to indicate the original source of the article and author information.

Related Article