This article mainly introduces the Python implementation of sparse matrix sample code, small series feel very good, and now share to everyone, but also for everyone to do a reference. Let's take a look at it with a little knitting.
In engineering practice, in most cases, the large matrix is generally sparse matrix, so how to deal with the sparse matrix is very important in practice. This article takes the implementation of Python in the example, first of all to discuss how the sparse matrix is stored in the representation.
Discussion on 1.sparse module
Python in the SciPy module, there is a module called sparse module, is specifically designed to solve the sparse matrix. Most of the content of this article is actually based on the sparse module.
The first step is naturally importing the sparse module
>>> from scipy Import sparse
Then help one, take a look around
>>> Help (sparse)
Find out what we care about most directly:
Usage Information ================= There is seven available sparse matrix types:1. csc_matrix:compressed Sparse Column format 2. csr_matrix:compressed Sparse Row Format 3. Bsr_matrix:block Sparse Row format 4. Lil_matrix:list of Lists format 5. Dok_matrix:dictionary of Keys format 6. Coo_matrix:coordinate format (aka IJV, triplet format) 7. Dia_matrix:diagonal format to construct a matrix efficiently, use either Dok_matrix or Lil_matrix. The Lil_matrix class supports basic slicing and fancy indexing with a similar syntax to NumPy arrays. As illustrated below, the COO format may also is used to efficiently construct matrices. To perform manipulations such as multiplication or inversion, first convert the matrix to either CSC or CSR format. The Lil_matrix format is row-based, so conversion to CSRS are efficient, whereas conversion to CSC are less so. All conversions among the CSR, CSC, and COO formats is efficient, linear-time operations.
Through this description, we have a general understanding of the sparse module. There are 7 ways to store sparse matrices in the sparse module. Next, we will introduce each of these 7 ways.
2.coo_matrix
Coo_matrix is the simplest way to store. Uses three arrays of row, col, and data to save information for non-0 elements. The three arrays are the same length, row saves the element's rows, Col saves the element's columns, and data holds the value of the element. In general, Coo_matrix is primarily used to create matrices, because Coo_matrix cannot manipulate the elements of the matrix, and once the matrix is created, it is transformed into other forms of the Matrix.
>>> row = [2,2,3,2]>>> col = [3,4,2,3]>>> C = Sparse.coo_matrix ((data, (Row,col)), shape= (5,6) ) >>> print C.toarray () [[0 0 0 0 0 0] [0 0 0 0 0 0] [0 0 0 5 2 0] [0 0 3 0 0 0] [0 0 0 0 0 0]]
One thing to note is that when you create a matrix with Coo_matrix, the same row and column coordinates can appear multiple times. Once the matrix is actually created, the corresponding coordinate values are added together to get the final result.
3.dok_matrix and Lil_matrix
The Dok_matrix and Lil_matrix scenarios are elements that gradually add matrices. Doc_matrix's strategy is to use a dictionary to record elements that are not 0 in the matrix. Naturally, the key of a dictionary is the Ganso of the location information of the recording element, and value is the specific value of the record element.
>>> import NumPy as np>>> from scipy.sparse import dok_matrix>>> S = Dok_matrix ((5, 5), dtype=n P.float32) >>> for I in range (5): ... For j in Range (5): ... S[i, J] = i + j...>>> print s.toarray () [[0]. 1.2. 3.4.] [1.2. 3.4. 5.] [2. 3.4. 5.6.] [3.4. 5.6. 7.] [4. 5.6. 7.8.]
Lil_matrix uses two lists to store non-0 elements. Data holds a non-0 element in each row, and rows holds the column that contains the non-0 element. This format is also a good fit for adding elements one by one, and can quickly get row-related data.
>>> from scipy.sparse import lil_matrix>>> L = Lil_matrix ((6,5)) >>> l[2,3] = 1>>> l[3 , 4] = 2>>> l[3,2] = 3>>> print L.toarray () [[0]. 0.0. 0.0.] [0.0. 0.0. 0.] [0. 0.0. 1.0.] [0.0. 3.0. 2.] [0. 0.0. 0.0.] [0.0. 0.0. 0.]]>>> print l.data[[] [] [1.0] [3.0, 2.0] [] []]>>> print l.rows[[] [] [3] [2, 4] [] []]
It is easy to see from the above analysis that the above two methods of constructing sparse matrices are generally used to construct matrices by gradually adding non-0 elements, and then converting them into other matrix storage methods that can be quickly computed.
4.dia_matrix
This is a way to store a diagonal line. Where the column represents the diagonal line, and the row represents the row. If the element on the diagonal is all 0, it is omitted.
If the original matrix is a very good diagonal matrix then the compression rate will be very high.
Looking for a picture on the internet, it's easy to see how it works.
5.csr_matrix and Csc_matrix
Csr_matrix, with the full name compressed Sparse row, compresses the matrix by row. A CSR requires three types of data: numeric, column, and row offsets. A CSR is a way of encoding, where values and column numbers mean the same as the COO. The row offset represents the starting offset position of the first element of a row within values.
Also on the network to find a picture, can better reflect the principle of it.
See how it's used in Python:
>>> from scipy.sparse import csr_matrix>>> indptr = Np.array ([0, 2, 3, 6]) >>> indices = Np.arra Y ([0, 2, 2, 0, 1, 2]) >>> data = Np.array ([1, 2, 3, 4, 5, 6]) >>> Csr_matrix (data, indices, indptr), Shap E= (3, 3)). ToArray () array ([[1, 0, 2], [0, 0, 3], [4, 5, 6]])
Well, isn't it hard to understand.
Let's see what the document says.
Notes | ----- | | Sparse matrices can be used on arithmetic operations:they support | addition, subtraction, multiplication, pision, and Matrix power. | | Advantages of the CSR format | -Efficient arithmetic operations CSR + CSR, CSR * CSR, etc | -Efficient row Slicing | -Fast matrix Vector Products | | Disadvantages of the CSR format | -Slow column slicing operations (consider CSC) | -Changes to the sparsity structure is expensive (consider LIL or DOK)
It is not difficult to see that Csr_matrix is more suitable for real matrix operations.
As for Csc_matrix, it is similar to Csr_matrix, except that it is compressed in a column-based manner and is no longer described separately.
6.bsr_matrix
Block Sparse Row format, as the name implies, is to compress the matrix according to the idea of chunking.