A detailed description of the sparse storage and transformation of the Python sparse matrix

Source: Internet
Author: User
This article mainly describes the Python sparse matrix-sparse storage and transformation of relevant data, the need for friends can refer to the following

Sparse Matrix-sparsep

From scipy Import Sparse

The storage form of sparse matrix

When solving linear models in the field of science and engineering, many large matrices are often present, and most of these matrices are 0, which is called sparse matrices. Using NumPy's Ndarray array to save such a matrix would be a waste of memory, and because of the sparse nature of the matrix, you can save memory by saving only information that is not 0 elements. In addition, the computation speed of the matrix can be improved by writing the operation function for the matrix of this special structure.

The Scipy.sparse library provides a variety of formats for representing sparse matrices, each of which has different uses, where Dok_matrix and Lil_matrix are suitable for adding elements incrementally.

Dok_matrix inherits from Dict, which uses a dictionary to hold elements that are not 0 in the matrix: A dictionary key is a tuple that holds information about an element (row, column), and its corresponding value is the element value in the matrix (row, column). It is obvious that the sparse matrices in the dictionary format are suitable for the addition, deletion, and access operations of individual elements. It is often used to gradually add non-0 elements and then convert them into other formats that support fast operations.

A = Sparse.dok_matrix ((5)) A[2:5, 3] = 1.0, 2.0, 3.0print A.keys () print a.values ()
[(2, 3), (3, 3), (4, 3)] [1.0, 2.0, 3.0]

Lil_matrix uses two lists to save non-0 elements. Data holds a non-0 element in each row, and rows holds the column that contains the non-0 element. This format is also a good fit for adding elements one by one, and can quickly get row-related data.

b = Sparse.lil_matrix ((5)) b[2, 3] = 1.0b[3, 4] = 2.0b[3, 2] = 3.0print B.dataprint b.rows
[[] [] [1.0] [3.0, 2.0] [] [] [] [] [] []][[] [] [3] [2, 4] [] [] [] [] [] []]

Coo_matrix uses three arrays of row, col, and data to save information for non-0 elements. The three arrays are the same length, row saves the element's rows, Col saves the element's columns, and data holds the value of the element. Coo_matrix does not support the access and deletion of elements, once created, in addition to converting them to other formats of the matrix, it is almost impossible to do any operation and matrix operations.

Coo_matrix supports repeating elements, that is, the same row and column coordinates can occur more than once, and when converted to a matrix of other formats, multiple values corresponding to the same row and column coordinates are summed. In the following example, (2, 3) corresponds to two values: 1 and 10, which are added together when converted to a Ndarray array, so the value on the final matrix (2, 3) coordinates is 11.

Many sparse matrix data are stored in this format in the file, such as a CSV file may have such three columns: "User ID, commodity ID, evaluation value." After reading the data in Numpy.loadtxt or pandas.read_csv, you can quickly convert it into a sparse matrix by Coo_matrix: Each row of the matrix corresponds to a user, each column corresponds to a commodity, and the element value is the user's evaluation of the product.

row = [2, 3, 3, 2]col = [3, 4, 2, 3]data = [1, 2, 3, 10]c = Sparse.coo_matrix (data, (row, col)), shape= (5, 6)) Print C.col , C.row, C.dataprint C.toarray ()
[3 4 2 3] [2 3 3 2] [1 2 3 10] [[0 0 0 0 0 0] [0 0 0 0 0 0] [0 0 0 11 0 0] [0 0 3 0 2 0] [0 0 0 0 0 0]]

Choice in personal operation, Coo_matrix is chosen because it involves sparse matrix operations, but if the complexity is too high (time and space) 1000*1000 in the matrix about 2h, it is also deadly without other forms of storage. Helpless thought of the Pajek software in the input format of the data ternary group:

So think of your own data processing into a similar ternary group!

"Matrix", "tuple ternary group", "Sparsematrix2tuple", "Scipy.sparse"

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.