Now whatever the relational network data size is very large, such as the https://snap.stanford.edu/data/above the public data set, is still tens of thousands of nodes, a hundred thousand of sides.
However, some of the laws behind the big picture data highlight the nature of the value of Big Data discovery.
This blog in the last community to undertake the problem of discovery, continue in the large-scale relationship network mining analysis and testing.
The main experimental data set employs three:
(1) https://snap.stanford.edu/data/com-DBLP.html
Com-dblp |
undirected, communities |
317,080 |
1,049,866 |
13,477 |
DBLP Collaboration Network |
On the 30多万个 node, the 100多万个 edge of the test, the speed is very fast, only need
(2) https://snap.stanford.edu/data/com-Youtube.html
Com-youtube |
undirected, communities |
1,134,890 |
2,987,624 |
8,385 |
Youtube Online Social Network |
Test for 110多万个 node, 3 million edges
(3) However, for the following data set, the number of sides more than 10 million, we found that the phenomenon of insufficient memory, this phenomenon is simply no solution, the ordinary PC is still not competent for the calculation of super-large graphs. Of course, to calculate, the data preprocessing operations.
Cit-patents |
Directed, temporal, labeled |
3,774,768 |
16,518,948 |
Citation network among US patents |