Community Discovery algorithm for large-scale networks mining louvain--social networks
= = = Algorithm source
The algorithm derives from the article fast unfolding of communities in large networks, referred to as Louvian. algorithm principle
Louvain algorithm is a community discovery algorithm based on the module degree (modularity), which is better in both efficiency and effect, and can discover the hierarchical community structure, the goal of which is to maximize the module degree of the whole graph attribute structure (Community network). The core points that need to be understood are: A, the definition of module degree modularity, this definition is the value Q that describes the degree of tightness in the community, B, the module Delta Delta Q, which is to put an isolated point into a Community C, calculate modularity changes, The main point of the calculation process is to first calculate the modularity of a point, and the modularity of Community C, and then calculate the modularity of the new community after merging, the modularity of the new community minus the first two modularity is Delta Q.
The understanding of the above formula is that the Delta Q expands its equivalent to the k_i,in/m-sum_tot/m *ki/m, where k_i,in/m represents the effect of putting isolated nodes and community c together on the entire network modularity, and sum_tot/ M and ki/m respectively indicate the effect of isolated nodes and community c separate on the whole network modularity, so their difference reflects the effect of isolated nodes on the whole network modularity before and after they are put into community c. The calculation process for the algorithm is as follows:
A, each point as a community, then consider each community neighbor node, merge to community, then look at Delta Q, find the largest positive delta Q, merge point to community, more rounds, to no longer change, then the end;
The problem is that the different order of node access will result in different results, and it is found that this order has little effect on the result, but it will affect the computation time to some extent.
b, the new community as a point, repeat the above process. So how to determine the weight before the new point. The answer is to put the weights between the points adjacent to the two community and the new weights as two community degenerate into one point.
The advantages of the algorithm are 3: A, easy to understand; B, unsupervised; and c, fast computing, and finally we can get results that are hierarchical community findings. Spark Implementation
Https://github.com/Sotera/spark-distributed-louvain-modularity Louvain result schematic
improvement of the algorithm
and its accelerated implementation of the paper, the title of the article is: A New randomized algorithm for the Community Detection in Large Networks, its implementation is more direct, is to consider a point around the number of points to be merged. This can be done under spark in a similar way to multi-merge. Other reference http://www.cnblogs.com/allanspark/p/4197980.html https://www.quora.com/ Is-there-a-simple-explanation-of-the-louvain-method-of-community-detection