Reference: http://blog.csdn.net/cleverlzc/article/details/39494957
Gephi is a visual processing software for the field of network analysis, which can be used for data analysis, link analysis, social network analysis and so on.
Label Propagation algorithm LPA (label Propagation algorithm) was first proposed as a solution for community discovery issues.
The main advantages are: Time complexity (approximate linearity), no need to know the number of communities beforehand.
Main algorithm Flow: first set a unique label for each node, and then iterate to update the individual nodes, for each node, by the statistics node neighbor tag, select the label of the most tags update the node, if the maximum number of notes is greater than one, then randomly select a label to update the node until convergence.
Label propagation algorithm node tag update policy is divided into two main, one is synchronous update, the other is not updated.
Where synchronization is updated: When performing a T-iteration update, only the label set after the t-1 update is dependent.
Asynchronous update: The asynchronous update policy is more concerned with the Order of node updates, so in the asynchronous update process, the update order of the nodes is randomly selected when the update of the T iteration is performed, and the set of tags that have been updated in the T iteration and the future and updated set of tags in the t-1 update but in the T iteration are also dependent.
The LPA algorithm is suitable for non-overlapping community discovery, and the copra (Community overlapping propagation algorithm) algorithm is proposed for the discovery of overlapping communities. The algorithm proposed that all nodes can belong to the V community at the same time, V is a set of individual global variables, it is obvious that V's choice directly affects the effectiveness of the algorithm, the choice for V requires sufficient prior knowledge, in the real Community network, the choice of V is not very well controlled.
SLPA (Speaker-listener based information propagation algorithm) algorithm introduces listener and Speaker two visualization concepts, during the label update process, The node to be updated is what we call listener, the node's domain node we call speaker, because listener the final tag attribute is determined by these speaker.
In LPA, the label of the maximum number of tags in the local node is assigned to the node, but this is an update rule that has multiple update policy rules in SLPA.
The basic algorithm flow is the same as LPA, with different points: A history label queue is set up for each node in Slpa, which records the label assigned to the node since the T iteration. Until the end of the iteration, the history label queue for each node is counted, and all tags with a label frequency exceeding a given threshold of ρ are the multiple communities to which the node belongs.
The SLPA algorithm involves an important parameter ρ, if the value of ρ is 1, then the SLPA algorithm degenerate into non-overlapping communities.
HANP (Hop attenuation & Node Preference) algorithm, the basic idea is:
① the score value for each label to evaluate the ability of the label to propagate, the score value decreases as the propagation distance increases.
Score Attenuation rule:
where δ represents the attenuation factor (hop attenuation), and δ equals 0 stops propagating
② the rules for each node when updating tags through the domain node Δ is rich (will comprehensively consider the ability of the label to propagate score value, label occurrence frequency, label degree, etc.)
Tag Update rules:
Bmlpa algorithm main idea:
This paper puts forward the concept of equilibrium attribution coefficient, that is, the number of communities that belong to each node is not limited, but the local equilibrium attribution coefficient of the label of the same node.
Label UPDATE: The domain label for the label is updated by a two-tuple sequence corresponding to it, the first item of the two-tuple represents the label category, and the second item represents the membership degree of the node belonging to that category, and for each node, the sum of all the affiliates is 1.
① the node label and membership degree in all fields according to the category, get all the labels in the field, and the corresponding affiliation of the label.
② A: The statistical label-the degree of membership in the sequence of the large membership value divided by the membership of all labels, a set of membership of the maximum value of 1 of the label-membership degree sequence.
③ Filter: The label-membership sequence is less than the threshold ρ of the tag out.
④ Label Update: Normalized B, in order to ensure that the final label-membership degree in the sequence of all the membership degree and to one, the membership of the label will be divided by the sum of all label membership degrees. Finally, the label-membership of the two-tuple is assigned to the label to be updated.
Fast Unfolding algorithm
This algorithm is a heuristic method based on modularity optimization.
The algorithm steps are divided into two main steps:
① first assigns each node to a unique community, then aggregates all nodes sequentially, finds the nodes to be aggregated, and then aggregates the nodes into domain nodes, computes the modularity value, and finally aggregates the node into the neighbor node with the largest modularity value. Iterate until all nodes are not able to improve the modularity value.
② all communities in the first phase of the Community network as "nodes," because the new "node" contains multiple nodes, all two "nodes" need to have weights, weights with two of the total number of inter-community edge weights.
The DCLP (Distance-control label propagation) algorithm, simplified by the HANP algorithm, only considers the attenuation factor in the label propagation process, and uses the distance dis_allowed instead of Δ, which can effectively control the range of the label propagation.
The AM-DCLP algorithm performs the DCLP algorithm on the original image, but when some of the communities are large, the corresponding sub-graphs are executed DCLP algorithm.
Two control parameters:
Maxc_allowed: The largest value of community size
Break_down_allowed: Allows the cardinality of the DCLP algorithm to be called
SDCLP algorithm: After each iteration of the DCLP algorithm, all communities are tested, and if the community is large enough, the iteration is terminated, otherwise the iteration is continued.
Pros: Timely termination can be effective in the presence of large communities anyway, while too small a community can get the development from the hair through iteration.
Label Propagation algorithm