I. Definition OF the Community
Newman first proposed the module degree definition is published in 2004 this article "Fast algorithm for community structure in networks", the first time to use a quantitative formula to determine the Community division.
First, let's see how Newman defines the community: the vertices in networks is often found to cluster into tightly knit groups with a high density of W Ithin-group edges and a lower density of between-group edges.
In plain English, it is: as much as possible within the community, but with as few sides as possible between the communities
(Some definitions): I, J refers to community I and Community J;
n is the number of nodes in the network;
M is the number of edges in the network. Two nodes are connected on one side, and obviously, 2m is the sum of all nodes in the network
Second, how to quantify the modulus of speed?
We first use EIJ to show that the number of edges connected between community I and Community J is greater than the number of edges of the entire network, EII that the number of internal edges of community I is greater than the entire network edge, so we just have to make ∑ieii as big as possible, but the problem comes again, the biggest must be 1, all nodes into a community , then it is obviously meaningless.
So he proposed that the network connected two of the same type of edge (that is, the proportion of the inner edge of the community Eii) minus the same structure arbitrarily connected to the two nodes of the proportion of the expectation, so the module appeared
Q=∑i (EII-AI2)
among them, Ai=∑je ij represents the proportion of all edges that are connected to the nodes in community I. If the ratio of the internal edge of the community is not greater than the expectation of random connections within the community, then the q=0 is 1. Generally speaking, the most corresponding community structure of Q value is the community structure in the network.
Third, how to become algorithmic operability?
That means we just have to optimize Q, but how many communities are there to divide n nodes? How many nodes per community? The author points out that there is a possibility of 2n-1, so that the Q can not be extended to more than 20 nodes above the network? In order to reduce the complexity of time, the author proposes a greedy strategy
FN: (1) First customize each node in the network into a community
(2) Calculate 22 Community Union is the value of Q, find Q to increase the largest or reduce the minimum number of merging methods for community consolidation
(3) Stop when all communities merge into a large community, and find out what the biggest q in the merging process is. Community Partitioning results
At this time, Newman noticed that when two communities merged, the increment of the module detaq=(eji+eij-2ai* aJ) =2 (eij-2ai*aj)
Four, the code came
Clear allclose allclc% load preprocess.mat% e=e;load (' Dolphin.mat '); e=a;% e (Find (e>0)) =1;% establishes adjacency matrix Tic;e=e;e (e==1) =1/sum (E (:)); A=sum (e); n=size (a,2); B=[1:n];b=num2cell (b);% Variables used to store community elements c={};k=1;while Length (e) >1 lg=length (e); detaq=-(10^9) *ones (n-k+1);% Q for i=1:lg-1 for J=I+1:LG if E (i,j) ~=0 Detaq (i,j) =2 * (E (I,J)-A (i) *a (j));% calculation Q end END End If sum (detaq+ (10^9)) ==0 break end% Q (k) =max (Detaq (:));% look for the maximum value of Q and store it in Q (k) Matrix%-----------------------------Find the maximum q corresponding to two societies and merge it and change the e matrix [I,j]=find (Detaq==max (Detaq (:))); For Ii=1:length (i) E (J (ii),:) =e (I (ii),:) +e (J (ii),:); E (I (ii),:) = 0; E (:, J (ii)) =e (:, I (ii)) +e (:, J (ii)); E (:, I (ii)) =0;% E (i,i) =e (i,i)/2;% ————————— records q the largest corresponding community and the elements in each community B{j (ii)}=[b{i (ii)} B{J (ii)}]; B{i (ii)}=0; End E (I,:) =[]; E (:, I) =[]; B (I) =[]; C (k,:) =num2cell (Zeros (1,n)); C (K,1:length (b)) =b; For Kk=1:length (b) C2=cell2mat (c (K,KK)); C2(c2==0) =[]; C{K,KK}=C2; C2=[]; Enda=sum (e); k=k+1;tmp=0; For Jj=1:length (e) tmp=tmp+ (E (JJ,JJ)-A (JJ) *a (JJ)); ENDQ (k) =tmp;endmax_k=find (Q==max (Q (:))) -1;ll=0;for i=1:length (C (max_k,:)) If sum (C{max_k,i}) ~=0 ll=ll+1; C{max_k,i}=c{max_k,i} (c{max_k,i}~=0); Endendc_newman=c (MAX_K,1:LL); Label=zeros (n,1); for I=1:ll label (C{max_k,i} ') =i;end
Introduction to Fast NEWMAN-FN algorithm and module degree definition