Sparse Matrix Storage Format Summary + storage efficiency comparison: COO,CSR,DIA,ELL,HYB

Last Update:2016-11-08 Source: Internet

Author: User

Tags intel mkl

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Sparse Matrix Storage Format Summary + storage efficiency comparison: COO,CSR,DIA,ELL,HYB

Original: Http://www.cnblogs.com/xbinworld/p/4273506.html?utm_source=tuicool&utm_medium=referral

Sparse matrix refers to the matrix elements are mostly 0 of the matrix, in fact, the actual problem of large-scale matrices are basically sparse matrices, a lot of sparsity in 90% or even more than 99%. So we need to have an efficient sparse matrix storage format. This article summarizes several typical formats: Coo,csr,dia,ell,hyb.

(1) Coordinate (COO)

This is the simplest of a format, each element needs to be represented by a ternary group, respectively (line number, column number, numeric value), corresponding to the right column. This method is simple, but the record single information more (row and column), each ternary group itself can be positioned, so the space is not optimal.

(2) Compressed Sparse Row (CSR)

CSR is a comparative standard and requires three types of data to express: Numeric, column number, and row offset. CSR is not a ternary group, but a whole coding method. Values and column numbers are consistent with the COO, representing an element and its column number, and the row offset represents the starting offset of the first element of a row within values. For example, the first row element 1 is a 0 offset, the second row element 2 is the 2 offset, the third row element 5 is the 4 offset, and the 4th row element 6 is the 7 offset. The total number of elements of the matrix at the end of the row offset, in this case, 9.

CSC is a method that corresponds to a CSR, that is, the meaning of compression by columns.

Take the medium matrix as an example:

Values: [1 5 7 2 6 8 3 9 4]

Row indices:[0 2 0 1 3 1 2 2 3]

Column offsets:[0 2 5 7 9]

Take another look at a CSR example [4]:

(3) Ellpack (ELL)

Save with two matrices of the same number of rows as the original matrix: the first matrix is the column number, the second matrix is the numeric value, the row number is not saved, and the row is represented by its own line, and each line of the two matrices is placed from scratch, and if there is no element, it ends with a sign such as *. the middle matrix is wrong, and the third line should be 0 2 3.

Note: If a line is a lot of elements, then the next two matrices will be very fat, the other end of the line * a lot of waste. You can save up to an array, such as the above two matrices:

0 1 * 1 2 * 0 2 3 * 1 3 *

1 7 * 2 8 * 5 3 9 * 6 4 *

But it's not convenient to take a line like this.

(4) Diagonal (DIA)

The diagonal storage method, which is stored diagonally, the column represents the diagonal line, and the row represents the row. Omit the diagonal of all zeros. (Start from bottom left to top right: the first diagonal is 0 ignored, the second diagonal is 5, 6, the third diagonal is 0 ignored, the fourth diagonal is 1,2,3,4, the fifth diagonal is 7,8,9, sixth seventh diagonal is ignored). [3]

This line corresponds to the row, so 5 and 6 are in line Fourth of the third row, and the previous invalid element *. If there is 0 in the middle of the diagonal, it also needs to fill 0, so if the original matrix is a very good diagonal matrix then the compression rate will be very high, for example, but if it is random, then the efficiency will be very bad.

(5) Hybrid (HYB) ELL + COO

In order to solve (3) ell, if a line is particularly large, causing other rows to be wasted, then the extra elements (such as the third row of 9, each of the other rows is the largest of 2 elements) are stored separately with the COO.

Some experience in choosing a sparse matrix storage format [2]:

The DIA and ELL formats are the most efficient for sparse matrix-vector product (sparse matrix-vector products), so they are the fastest format for sparse linear systems using iterative methods such as conjugate gradient methods;
The COO and CSR formats are more flexible and easy to operate than Dia and Ell.
The advantages of ELL are fast, and the COO advantage is flexible, the combination of the HYB format is a good sparse matrix representation format;
According to Nathan Bell's work, the CSR format uses the average number of bytes (Bytes per nonzero Entry) for non-0 elements when storing sparse matrices The most stable (float type is about 8.5,double type is about 12.5), and the DIA format stores data in the average number of bytes used by non-0 elements and matrix type has a greater relationship, suitable for the STRUCTUREDMESH structure of the sparse matrix (float type about 4.05,double type about 8.10 ), the number of bytes used for unstructured mesh and the random matrix,dia format is more than 10 times times the CSR format;
From some of the linear algebraic computing libraries I used, the COO format is often used to read and write sparse matrices from files, such as Matrix market, which uses the COO format, and the CSR format is often used for sparse matrix computations after reading data.

The storage efficiency of some special types of matrices (the smaller the number, the higher the compression rate, the higher the storage efficiency):

Structured Mesh

Unstructured Mesh

Random Matrix

Power-law Graph

Format Applicability Summary:

The following excerpt from [2]

6. Skyline Storage Format

The Skyline storage format is important for the direct sparse solvers, and it's well suited for Cholesky or LU decomposit Ion when no pivoting is required.

The Skyline storage format accepted in Intel MKL can store is triangular matrix or triangular part of a matrix. This format was specified by arrays: and values pointers . The following table describes these arrays:


values: A scalar array. For a lower triangular matrix it contains the set of elements from each row of the matrix starting from the first Non-zero element to and including the diagonal element. For a upper triangular matrix it contains the set of elements from each column of the matrix starting with the first non- The zero element down to and including the diagonal element. Encountered zero elements is included in the sets.

pointers: An integer array with dimension (m+1) , where are the number of m rows for lower triangle (columns for the upper triangle ).pointers(i) -pointers(1)+1 gives the index of element in this is first values Non-zero an element in row (column) i . pointers(m+1) nnz+pointers(1) The value of is set to, where is the number of elements in the nnz array values .

7. Block Compressed Sparse Row Format (BSR)

The Intel MKL block compressed sparse row (BSR) format for sparse matrices are specified by four arrays: values , columns , and pointerE . The following table describes these arrays.


values: A real array that contains the elements of the non-zero blocks of a sparse matrix. The elements is stored block-by-block in row-major order. A Non-zero Block is the block, the contains at least one Non-zero element. All elements of Non-zero blocks was stored, even if some of them is equal to zero. Within each Non-zero block elements is stored in column-major order in the case of one-based indexing, and in Row-major O Rder in the case of the zero-based indexing.

columns: Element of the integer array is the number of the "column in the" the i columns block matrix that contains the i -th Non-zer O block.

pointerB: element of this integer array gives the index of the element in the array, is first j columns Non-zero block in a row of the j block matrix.

pointerE: element of this integer array gives the index of the element in the array that contains the last j columns Non-zero block In a row of the j block matrix plus 1.

[1] Sparse Matrix representations & iterative solvers, Lesson 1 by Nathan Bell. Http://www.bu.edu/pasi/files/2011/01/NathanBell1-10-1000.pdf

[2] http://blog.csdn.net/anshan1984/article/details/8580952

[3] Http://zhangjunhd.github.io/2014/09/29/sparse-matrix.html

[4] Http://www.360doc.com/content/09/0204/17/96202_2458312.shtml

[5] Implementing Sparse matrix-vector multiplication on throughput-oriented processors, Nathan Bell and Michael Garland, P Roceedings of Supercomputing ' 09

[6] Efficient Sparse matrix-vector multiplication on CUDA, Nathan Bell and Michael Garland, NVIDIA Technical report NVR-20 08-004, December 2008

Sparse Matrix Storage Format Summary + storage efficiency comparison: COO,CSR,DIA,ELL,HYB

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More