[Network analysis] Summary of complex network analyses

Source: Internet
Author: User

In our real life, many complex systems can be modeled as a complex network for analysis, such as common power networks, aviation networks, traffic networks, computer networks, social networks, and so on. Complex network is not only a form of data, but also a means of scientific research. Research on complex networks has received extensive attention and research, especially as various online social platforms flourish, and research on online social networks is becoming more and more hot. During the postgraduate period, my research direction has been dealing with the complex network, and now will graduate, write a blog brief introduction of the complex network features and some of the complex network research content of the introduction, I hope interested bloggers can discuss together, study together.

is an aviation network () and the Facebook Network Global Friendship Map ().

1. Features of complex networks

Qian Xuesen has given a strict definition to the complex network: it is called complex network with self-organization, self-similarity, attractor, Small world, and some or all of the properties without scale. The implication, complex network refers to a high complexity of the network, its characteristics are mainly embodied in the following aspects:

1.1 Small World features

Small World theory is also known as a six-degree space theory or six-degree segmentation theory (Six degrees of separation). Small World features point out that no more than six people in a social network can be separated from each other, as shown in:

when considering network features, it is common to use two features to measure a network: characteristic path length: in the network, choose two nodes, connect the minimum number of sides of these two nodes, define the path length of these two nodes, the average of the path length of all the nodes in the network, Defines the feature path length for the network. This is the global feature of the network.    aggregation factor (clustering coefficient): Assuming that a node has $k$ edges, the number of edges that may exist between the nodes ($k $) of the $k$ Edge join IS $k (k-1)/2$, The fractional value obtained by dividing the actual number of edges by the number of possible edges, defined as the aggregation factor for this node. The mean value of the aggregation factor for all nodes is defined as the aggregation factor for the network. The aggregation coefficient is the local characteristic of the network, reflecting the coincidence degree of the circle of friends between two neighboring people, that is, the friend of the node is also the degree of friend.   for a rule network, the length of the characteristic path between any two points (individuals) is long (by how many individuals are linked together), but the aggregation coefficient is high (you are a friend of friends with high odds). For random networks, the feature path length between any two points is short, but the aggregation coefficient is low. While the Small World network, the points feature path length is small, close to random networks, and the aggregation coefficient is still quite high, close to the rule network. The small-world characteristics of complex networks are closely related to the dissemination of information in networks. The actual social, ecological, and other networks are Small world networks, in such a system, the information transmission speed, and a small number of changes in a few connections, you can drastically change the performance of the network, such as the existing network to adjust, such as cellular telephone network, change a few lines, can significantly improve performance. 1.2 Scale-free characteristics

Most of the real-world network is not a random network, a small number of nodes tend to have a large number of connections, and most of the nodes are very few, the degree distribution of nodes in line with the power distribution, and this is called the network scale-free characteristics (scale-free). It is called scale-free network that the degree distribution conforms to the Power Law distribution complex network.

is the degree distribution of a BA scale-free network with 100,000 nodes:

The scale-free characteristic reflects the serious heterogeneity of the complex network, and the connection state (degree) between the nodes has a serious uneven distribution: a few nodes in the network called Hub Point have very many connections, and most of the nodes have a very small number of connections. A few hub points play a leading role in the operation of scale-free networks. In a broad sense, the scale-free degree of scale-free network is an intrinsic property that describes the serious uneven distribution of a large number of complex systems on the whole.

In fact, the scale-free characteristic of complex network is closely related to the robustness analysis of network. The existence of power-law distribution in scale-free networks greatly improves the probability of the existence of height-number nodes, so the scale-free network shows robustness against random faults and vulnerability to intentional attacks. This robustness and vulnerability has a great impact on network fault tolerance and anti-attack ability. The research shows that the scale-free network has a strong fault tolerance, but for the selective attack based on the node-degree value, its anti-attack ability is quite poor, the existence of the height-number node greatly weakens the robustness of the network, and a malicious attacker simply chooses to attack a small number of nodes in the network, which can quickly paralyze the network.

1.3 Community Structure features

People with clustering, things with a group of points. Nodes in complex networks often also exhibit cluster characteristics. For example, there is always an acquaintance circle or a circle of friends in a social network, where each member knows the other members. The significance of cluster degree is the degree of network collectivization; This is a kind of network cohesion tendency. The concept of connected group reflects the distribution and interconnection of small networks in a large network. For example, it can reflect the interrelationship of this circle of friends with another circle of friends.

A description of the phenomenon of network aggregation:

2. Community Testing

Community detection (community detection), also known as Community discovery, is a technique used to uncover network aggregation behavior. Community testing is actually a network clustering approach, where the "community" does not have a strict definition in the literature, and we can interpret it as a set of nodes with the same characteristics. In recent years, community detection has been developed rapidly, which is mainly due to the concept of a modular degree (modularity) in the field of complex network, so that the Newman of the network community can be measured by a definite evaluation index. A network in the case of the Community division corresponding to different modules, the greater the degree of modularity, the corresponding Community division is more reasonable; If the module size is smaller, the corresponding network community division will be more blurred.

Describes the community structure in the network:

The formula for calculating the module degree Newman is as follows:

$Q =1/(2m) \sum_{ij} (a_{ij}-k_{i}k_{j}/(2m)) \delta (C_{i},c_{j}) $

Where $m$ is the total number of edges in the network, $A $ is the corresponding adjacency matrix of the network, $A _{ij}=1$ represents the node $i$ and the node $j$ there is a connecting edge, otherwise there is no edge. $k _{i}$ is the degree of Node $i$, $C _{i}$ is the label for the node $i$ belongs to a community, and $\delta (C_{i},c_{j}) =1$ when and only $c_{i}=c_{j}$.

The above definition of modularity is well understood, and we can understand it according to the empty model of a network. The empty model of a network can be understood as a node without a connecting edge, when a node can be connected to any other node in the graph, and the probability that the node $i$ and $j$ are connected can be computed. Randomly select a node and node $i$ the probability of connecting to $k_{j}/2m$, randomly select a node and node $j$ the probability of connecting to $k_{j}/2m$, then node $i$ and node $j$ the probability of connecting $p_{i}p_{j}=k_{i}k_{j}/(4m^{2 }) $, the expected number of sides $p_{ij}=2mp_{i}p_{j}=k_{i}k_{j}/(2m) $. so the degree of modularity is actually the difference between a network and a random network in a certain community, because the random network does not have a community structure, the larger the corresponding difference indicates the better the Community division.

  The modular degree proposed by Newman has two meanings:

(1) The proposed module has become a common index for community detection and evaluation, and it is a quantitative index to measure the quality of network community.

(2) The proposed method greatly promotes the development of various optimization algorithms applied in Community detection field. On the basis of the module degree, many optimization algorithms are optimized by the optimization of the target equation, which results in the good community partitioning result when the objective function is maximized.

Of course, the concept of modular degree is not absolutely reasonable, it also has drawbacks, such as resolution limitations, and so on, later domestic scholars on the basis of the modular degree of the concept of module density, can be a good solution to the shortcomings of the module degree, here is not detailed introduction.

Common community detection methods include the following:

(1) method based on graph segmentation, such as Kernighan-lin algorithm, spectral split method, etc.

(2) Methods based on hierarchical clustering, such as GN algorithm, Newman fast algorithm, etc.

(3) Based on the method of module degree optimization, such as greedy algorithm, simulated annealing algorithm, memetic algorithm, PSO algorithm, evolutionary multiobjective optimization algorithm.

3. Structural Balance

The structural balance (Structural Balance) was proposed mainly for the research of social networks, which originated from a structural equilibrium theory proposed by social psychologist Heider.

  3.1 Development of the network balance

Network balance is sometimes called social balance (social Balance), in terms of the development of network balance, we can divide it into three stages of development.
  The 3.1.1 of network balance theory
The term "network balance" was first proposed by Heider based on the study of Social psychology, Heider in the 1946 article Attitudes and cognitive organization[1] in the concept of network balance to put forward the first balance theory:
(1) A friend is a friend;
(2) The enemy of a friend is an enemy;
(3) The enemy's friend is the enemy;
(4) Enemy enemies are friends.
The Heider theory described above is represented by a common ternary combination:

The equilibrium theory mentioned above is the earliest theory about the network balance, which is later called the Strong equilibrium theory.

In the 1956, Cartwright and harary the balance theory of Heider and used it in graph theory (STRUCTURAL balance:a generalization of Heider ' S theory[2]). Cartwright and Harary point out that for a symbolic network, the necessary and sufficient condition of network balance is that all triples in the network are balanced, and the conclusion can be stated as a symbolic network balance, the necessary and sufficient condition is that all the loops (cycles) contained in it are balanced ("-" Number is an integer number). Moreover, in this article, they also put forward the well-known theory of structural equilibrium: If a symbolic network is balanced, then the network can be divided into two molecular networks, where each sub-network inside the node is connected to the connection is positive, the connection between the network is a negative connection.

The main focus of the development of network balance in this phase is to construct the psychological and sociological model of network balance.

  Mathematical model of 3.1.2 network balance

After the foundation work of Heider and others, the development of network balance is mainly to build its mathematical model, such as the dynamic performance of the network, how a network connection changes over time, how the relationship between the friends or enemies in the network evolves and so on.

  Application of 3.1.3 Network balance

The most recent research on network balance is to study some online networks, such as the analysis of user attributes of a website, and so on. Moreover, we are in the era of big data, we have to study the size of the network has become a large or even super-large network, in this context, how to calculate whether a network balance becomes the main hot issue in this field.

3.2 Basic theory of network balance

(1) Heider theory (strong equilibrium theory SBT).

(2) Structural equilibrium theory (Structural Balance Theroem): In a fully symbolic network, the necessary and sufficient condition of network balance is that all of its ternary groups (loops) are balanced.

Structural equilibrium inference: the necessary and sufficient condition of a fully symbolic network balance is that it can be divided into two parts x and Y,x and y inside the node connections are positive connections, the connection between x and Y is a negative connection.

(3) Weak balance theory (A weaker form of structural BALANCE,WSBT): If there is no such ternary group in the fully symbolic network: Two sides are positive and the other side is negative, then the network is called a weakly balanced network.

For the weak equilibrium theory, the ternary group, the three sides are negative connection ternary group also belongs to the equilibrium ternary group, that is, four cases of ternary group have three kinds of equilibrium state, one belongs to the imbalance (both sides are positive, side is negative).

Weak equilibrium network inference: If a network is weak equilibrium theory, then it can be divided into several parts, each part of the connection is positive, the connection between the parts is negative.

(4) The definition of any network balance.

1) for an arbitrary network, if we can fill its missing edge to make it a balanced complete symbolic network, then the original network is balanced network;
2) for an arbitrary network, if we can divide it into two parts, so that each part of the connection is a solid line, the connection between the parts are dashed.
The two definitions above are equivalent.
The necessary and sufficient condition for a symbolic network balance is that it does not include loops with odd number of negative connections.

(5) Approximate balance network (slightly).

 3.3 Calculation of the network balance (A spectral algorithm for computing social balance)
  Proposition 1: number of triples involved in node I

A is the adjacency matrix, the element value may be: 1,-1,0;
G is the adjacency matrix, the element value may be: 0,1.
Proposition 2: For node I, the number of balanced triples that BI participates in, and the number of unbalanced triples that the UI participates in, the

  theory 1: for fully symbolic graphs,

The proportion of the balanced ternary group is

  theory 2: for any symbolic network, the proportion of the balanced ternary group is

  
Note: The above two formulas for calculating the network balance, the eigenvalues can be large to small selection before a few relatively large, like PCA, this can make the computational complexity greatly reduced.

4. Maximizing Impact

With the development of various online social platforms, social platforms (such as QQ, Weibo, circle of friends, etc.) are not only a social platform for users to communicate, they are also a major medium for the generation and dissemination of social information. The impact-maximizing (influence maximization), as well as the structural equilibrium, is also proposed for the study of social networks, which comes from the marketing of economics. In 2001, the impact maximization was first presented in the form of an algorithmic problem in domins. And the impact of the maximization of a wide range of attention is in 2003 Kempe and other people at the KDD conference in the year published a paper on the impact maximization, then the various impact maximization algorithm was quickly proposed, in the last more than 10 years, the impact of the most relevant articles reached the waking, The problem is still worth paying attention to.

The impact maximization problem can be described as: how can a business or business use a social platform (such as Sina Weibo) to promote their own new products or services, and how to employ micro-bloggers in limited funds to promote them to the fullest extent?

We then give a general definition of the effect maximization:

Given a network $g$ and an integer $k$ (typically less than 50), how to find $k$ nodes in $g$, making this $k$ node composed of nodes set $s$ the impact of the propagation range $\sigma (S) $ maximum.

Based on the definition of maximizing the above effects, it is easy to know that the impact maximization itself belongs to a combinatorial optimization problem. The most commonly used impact maximization propagation models are the independent cascade Propagation Model (ICM) and the linear threshold propagation model (LTM).

The main algorithms for maximizing the impact can be divided into the following categories:

(1) Heuristic method based on network centrality: such as maximal degree method, shortest average distance method, PageRank method, etc.

(2) based on the sub-modular greedy method: such as the most classical greedy algorithm, Celf algorithm and later newgreedy and celf++, etc.;

(3) Methods based on community structure: such as CGA algorithm, CIM algorithm, etc.;

(4) The method based on objective function optimization: such as simulated annealing algorithm.

5. Network communication

The network communication field involves many aspects, such as network node importance ordering, network robustness analysis, network information outbreak threshold optimization and so on. These areas are very interesting, interested Bo friends can be a good study.

6. Supplement6.1 Network Visualization Tools

First of all, I recommend two of my favorite network visualizer: Pajek (click to enter the official website), Gephi (click to enter the official website).

Below is a network topology diagram under the Pajek visualization window:

This is a visual effect of Gephi:

   6.2 Network data sets

Some common public data sets are organized:

Pajek (visual Tools) data set: Http://vladowiki.fmf.uni-lj.si/doku.php?id=pajek:data:index;

Newman (Complex network science field Daniel) personal data set: http://www-personal.umich.edu/~mejn/netdata/

Stanford University Large Scale network dataset: http://snap.stanford.edu/data/

Fudan University Network DataSet collation: Http://gdm.fudan.edu.cn/GDMWiki/Wiki.jsp?page=Network%20DataSet

Konect Data Set collation: http://konect.uni-koblenz.de/

7. References

[1] Grivan and Newman. Community structure in social and biological networks. PNAS, 2002.

[2] Newman and Grivan. Finding and evaluating community structure in networks. PRE, 2004.

[3] Newman. Networks:an Introduction. .

[4] Cartwright and Harary. Structural balance:a generalization of Heider ' s theory. 1956.

[5] Facchetti et al. Computing Global Structural balance in large-scale signed social networks. .

[6] Kempe et al. maximizing the spread of influence through a social network. 2003.

[7] Chen et al. efficient influence maximization in social networks. .

[8] Ginchoron, Lü. A summary of the ranking methods of network important nodes. 2014.

[Network analysis] Summary of complex network analyses

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.