Python sparse matrix-sparse storage and conversion, python matrix-sparse
Sparse Matrix-sparsep
from scipy import sparse
Storage form of Sparse Matrix
Many large matrices often appear when solving linear models in the scientific and engineering fields. Most of these matrices have 0 elements, which are called sparse matrices. Saving such a matrix using the NumPy ndarray wastes memory. Due to the sparse nature of the matrix, you can save only information about non-zero elements to save memory usage. In addition, writing operation functions for this special structure of the matrix can also improve the calculation speed of the matrix.
The scipy. sparse Library provides multiple sparse matrix formats, each of which has different uses. dok_matrix and lil_matrix are suitable for gradually adding elements.
Dok_matrix is inherited from dict. It uses a dictionary to save elements not 0 in the matrix. The dictionary key is a tuple that stores the information of elements (rows and columns, the corresponding value is the element value in the matrix (row, column. Obviously, the sparse matrix in the dictionary format is suitable for adding, deleting, and accessing a single element. It is usually used to gradually add non-zero elements and convert them to other formats that support quick operations.
a = sparse.dok_matrix((10, 5))a[2:5, 3] = 1.0, 2.0, 3.0print a.keys()print a.values()
[(2, 3), (3, 3), (4, 3)][1.0, 2.0, 3.0]
Lil_matrix uses two lists to save non-zero elements. Data stores non-zero elements in each row, and rows stores the columns of non-zero elements. This format is also suitable for adding elements one by one and getting row-related data quickly.
b = sparse.lil_matrix((10, 5))b[2, 3] = 1.0b[3, 4] = 2.0b[3, 2] = 3.0print b.dataprint b.rows
[[] [] [1.0] [3.0, 2.0] [] [] [] [] [] []][[] [] [3] [2, 4] [] [] [] [] [] []]
Coo_matrix uses three Arrays: row, col, and data to store information of non-zero elements. These three arrays have the same length. row stores the row of the element, col stores the column of the element, and data stores the value of the element. Coo_matrix does not support element access, addition, and deletion. After being created, in addition to converting it into a matrix of other formats, it is almost impossible to perform any operations or matrix operations on it.
Coo_matrix supports repeated elements, that is, the same column and column coordinates can appear multiple times. When converted to a matrix of other formats, multiple values corresponding to the same column and column coordinates are summed. In the following example, (2, 3) corresponds to two values: 1 and 10. When the two values are converted to an ndarray, the two values are combined. Therefore, in the final matrix (2, 3, 3) The coordinate value is 11.
Many sparse matrix data are stored in files in this format. For example, a CSV file may have three columns: "User ID, product ID, and evaluation value ". Use numpy. loadtxt or pandas. after read_csv reads data, it can be quickly converted to a sparse matrix using coo_matrix: each row of the matrix corresponds to one user, and each column corresponds to one item, the element value is the user's evaluation of the product.
row = [2, 3, 3, 2]col = [3, 4, 2, 3]data = [1, 2, 3, 10]c = sparse.coo_matrix((data, (row, col)), shape=(5, 6))print c.col, c.row, c.dataprint c.toarray()
[3 4 2 3] [2 3 3 2] [ 1 2 3 10][[ 0 0 0 0 0 0] [ 0 0 0 0 0 0] [ 0 0 0 11 0 0] [ 0 0 3 0 2 0] [ 0 0 0 0 0 0]]
In personal operations, coo_matrix is selected because it involves sparse matrix operations. However, if it is not stored in other forms, the complexity is too high (Time and Space). About 2 h for a matrix of 1000x1000, it's terrible. Instead, I thought of the data input format triple in the Pajek software:
So I want to process my data into a similar triple!
That is, "matrix"-> "tuple triple"-> "sparseMatrix2tuple"-> "scipy. sparse"
Thank you for reading this article. I hope it will help you. Thank you for your support for this site!