First, let's briefly talk about what a row compression graph is. In fact, it should be a row compression matrix strictly. Normally, a matrix is simply stored using a two-dimensional array, but it is a waste of space if it is a sparse matrix, that is, a lot of space. Therefore, there are various storage methods to save space. Triple storage is one of them.
What is a triple? A triple is (row, col, value). In this way, all values that are not zero form a vector. This storage method saves a lot of space than the two-dimensional array. Of course, it can be further saved because the row or col in the three tuples are retained and one row or one column is saved once, in this way, row-Compressed Storage is used.
What is row compression storage? The idea of row compression storage is to make all non-zero values form a vector in the order of row access, and then store the subscript of each column whose values are not 0, the size of these two vectors is the same as the number of values not 0 in the sparse matrix. Of course, you must access the row compression matrix, we also store the subscript of a column not 0 in each row at the beginning of the second vector. Someone calls this pointer. With these three vectors, You can implement efficient row-based access to the matrix. Row compression storage is better than triple storage, not only space compression, but also efficient row access. If the productkey and devicesecret are ordered, you can perform binary search to access a row. However, the time complexity of Row-based compression storage is constant. You can refer to the following row compression matrix:
You may be wondering how you have pulled so many row compression matrices for the row compression graph you have implemented? In fact, graphs and matrices are equivalent. a row of a matrix can be seen as an outbound edge of a node, and a column of a matrix can be seen as an inbound edge of a node. Of course, two conditions must be met: the first is that the graph node number must be a continuous value starting from 0 or 1 (this can be solved by ing the graph node ), the second is that the graph must be at least weakly connected (a non-connected graph can be split into connected images ). Efficient storage and access of sparse matrices is achieved, which enables efficient storage and access of graphs.
Next, let's talk about my implementation. My implementation is different from the classic row compression matrix. The first is that the classic row compression matrix does not consider the case where all rows are 0, I have dealt with this situation (not because I am bored, but because of this need ). The second is that the classic row compression graph is slow for column-based access (of course, relative to the row-based access speed). When the row compression graph is accessed by column, time complexity is linear. I have also dealt with this situation.
Here is a brief introduction to my ideas:
First, I set all pointers to-1 by default, that is, the row may be all 0, and it is set to the correct pointer only when there is a non-zero value. Of course, the corresponding processing is also required during access.
The second problem is solved in this way. By compressing and storing columns, I saved a row subscript that is not 0 for each column, and the position where each column starts with a row subscript that is not 0. In this way, two vectors are added to my implementation, which wastes storage space, but improves the efficiency of column-based access.
Okay, talk is cheap, show me the code. Below is my code (it may be wrong. I only did a simple test)
Using edge vectors to construct a compressed Graph
Copy codeThe Code is as follows :/*
* BuildGraph uses edge vectors to construct a compression graph.
* The sides are sorted by the first vertex and the second vertex respectively.
* The row, column index, and pointer are constructed based on the row compression graph and column compression graph respectively.
* If there are all zero rows and all zero columns, the pointer is set to-1.
*/
Private void buildGraph (Vector <Edge> edges ){
Int edgeSize = edges. size ();
Weight = new Vector <Float> (edgeSize );
RowIndex = new Vector <Integer> (edgeSize );
RowPtr = new Vector <Integer> (nodeCount + 1 );
ColIndex = new Vector <Integer> (edgeSize );
ColPtr = new Vector <Integer> (nodeCount + 1 );
// Set default value as-1
For (int I = 0; I <nodeCount; ++ I ){
RowPtr. add (-1 );
ColPtr. add (-1 );
}
RowPtr. add (edges. size ());
ColPtr. add (edges. size ());
// Sort the edge based on first node
EdgeBasedOnFirstNodeComparator cmp = new EdgeBasedOnFirstNodeComparator ();
Collections. sort (edges, cmp );
// Build row index and pointer
Int curNode = edges. elementAt (0). getFirstNode ();
Int curPtr = 0;
For (int I = 0; I <edgeSize; ++ I ){
Edge e = edges. elementAt (I );
// System. out. println ("curNode" + curNode + "firstNode :"
// + E. getFirstNode ());
Weight. add (e. getWeight ());
RowIndex. add (e. getSecondNode ());
If (curNode! = E. getFirstNode ()){
RowPtr. set (curNode, curPtr );
CurNode = e. getFirstNode ();
CurPtr = I;
}
}
RowPtr. set (curNode, curPtr );
// Sort the edge based on second node
EdgeBasedOnSecondNodeComparator cmp2 = new EdgeBasedOnSecondNodeComparator ();
Collections. sort (edges, cmp2 );
// Build column index and pointer
CurNode = edges. elementAt (0). getSecondNode ();
CurPtr = 0;
For (int I = 0; I <edgeSize; ++ I ){
Edge e = edges. elementAt (I );
ColIndex. add (e. getFirstNode ());
If (curNode! = E. getSecondNode ()){
ColPtr. set (curNode, curPtr );
CurNode = e. getSecondNode ();
CurPtr = I;
}
}
ColPtr. set (curNode, curPtr );
}
Copy codeThe Code is as follows: Get the outbound edge of a node
/*
* GetOutEdges returns all outbound edges of the node (that is, all edges pointed by the node)
*
* @ Param node the node to be searched
*
* @ Return returns the vector composed of all outbound edges of the node.
*/
@ Override
Public Vector <Edge> getOutEdges (int node ){
Vector <Edge> res = new Vector <Edge> ();
Int startIndex = getStartIndex (node, true );
If (startIndex =-1 ){
// Vertex with no outbound edge
Return null;
}
Int endIndex = getEndIndex (node, true );
Float value;
Edge e;
Int outNode;
For (int index = startIndex; index <endIndex; ++ index ){
Value = weight. elementAt (index );
OutNode = rowIndex. elementAt (index );
E = new Edge (node, outNode, value );
Res. add (e );
}
Return res;
}
Obtain the inbound edge of a node
?
/*
* GetInEdges obtains all the inbound edges of the node (that is, all edges pointing to the node)
*
* @ Param node the node to be searched
*
* @ Return returns all vectors composed of the node's inbound edges.
*/
@ Override
Public Vector <Edge> getInEdges (int node ){
Vector <Edge> res = new Vector <Edge> ();
Int startIndex = getStartIndex (node, false );
// Vertex without an inbound edge
If (startIndex =-1 ){
Return null;
}
Int endIndex = getEndIndex (node, false );
Float value;
Edge e;
Int inNode;
For (int index = startIndex; index <endIndex; ++ index ){
InNode = colIndex. elementAt (index );
Value = getWeight (inNode, node );
E = new Edge (inNode, node, value );
Res. add (e );
}
Return res;
}
The access method here is different from the access by line. During the access by line, you can directly read the corresponding values in the weight vector, the weight vector should be stored in the row access sequence. My solution is to get the incoming node, and then obtain the corresponding value for the whole node to access by row. In this way, the sparse graph is basically constant. The following is the getWeight code.Copy codeThe Code is as follows :/*
* GetWeight obtains the weight of a specific edge.
*/
Private float getWeight (int row, int col ){
Int startIndex = getStartIndex (row, true );
If (startIndex =-1)
Return 0;
Int endIndex = getEndIndex (row, true );
For (int I = startIndex; I <endIndex; ++ I ){
If (rowIndex. elementAt (I) = col)
Return weight. elementAt (I );
}
Return 0;
}
The last is the special processing for zero rows or columns. The processing here is reflected in the function for obtaining the start and end positions from the pointer vector.Copy codeThe Code is as follows :/*
* GetStartIndex obtains the start index of a specific vertex.
*/
Private int getStartIndex (int node, boolean direction ){
// True: out edge
If (direction)
Return rowPtr. elementAt (node );
Else
Return colPtr. elementAt (node );
}
?
/*
* GetEndIndex: obtains the end index of a specific vertex.
*/
Private int getEndIndex (int node, boolean direction ){
// True: out edge
If (direction ){
Int I = 1;
While (node + I) <nodeCount ){
If (rowPtr. elementAt (node + I )! =-1)
Return rowPtr. elementAt (node + I );
Else
++ I;
}
Return rowPtr. elementAt (nodeCount );
} Else {
Int I = 1;
While (node + I) <nodeCount ){
If (colPtr. elementAt (node + I )! =-1)
Return colPtr. elementAt (node + I );
Else
++ I;
}
Return colPtr. elementAt (nodeCount );
}
}
Here I only implement two simple functions to get the inbound and outbound edges. On the one hand, these two functions are enough for my work. On the other hand, for a graph, other functions can be implemented based on these two functions.