Machine learning on spark--section II: Basic data Structure (II)

Source: Internet
Author: User

The main contents of this section
    1. Indexedrowmatrix
    2. Blockmatrix
1. Use of Indexedrowmatrix

Indexedrowmatrix, as the name implies is an indexed Rowmatrix, which uses the case class Indexedrow (Index:long, Vector:vector) class to represent a row of the Matrix, Index is its index, The vector represents what it wants to store. It is used in the following ways:

Package CN. ML. Datastructimport org. Apache. Spark. Sparkconfimport org. Apache. Spark. Sparkcontextimport org. Apache. Spark. Mllib. Linalg. Vectorsimport org. Apache. Spark. Mllib. Linalg. Distributed. Rowmatriximport org. Apache. Spark. Mllib. Linalg. Distributed. Coordinatematriximport org. Apache. Spark. Mllib. Stat. Multivariatestatisticalsummaryimport org. Apache. Spark. Mllib. Linalg. Matriximport org. Apache. Spark. Mllib. Linalg. Singularvaluedecompositionimport org. Apache. Spark. Mllib. Linalg. Matricesimport org. Apache. Spark. Mllib. Linalg. Distributed. Indexedrowimport org. Apache. Spark. Mllib. Linalg. Distributed. IndexedrowmatrixObject Indexrowmatrixdemo extends App {val sparkconf = new sparkconf (). Setappname("Indexrowmatrixdemo"). Setmaster("spark://sparkmaster:7077"Val sc = new Sparkcontext (sparkconf)//define an implicit conversion function implicit def double2long (x:D ouble) =x. TolongThe first element in the data is index in Indexedrow, and the remaining maps to the vector//f. Take(1)(0Gets the first element and automatically converts it to a long type Val rdd1= SC. Parallelize(Array (1.0,2.0,3.0,4.0), Array (2.0,3.0,4.0,5.0), Array (3.0,4.0,5.0,6.0)          )      ). Map(f = Indexedrow (f. Take(1)(0), Vectors. Dense(f. Drop(1))) Val Indexrowmatrix = new Indexedrowmatrix (RDD1)//Calculate the pull matrix var Gramianmatrix:matrix=indexrowmatrix. Computegramianmatrix()//convert rows Matrix rowmatrix var Rowmatrix:rowmatrix=indexrowmatrix. Torowmatrix()//other methods such as COMPUTESVD compute singular value, multiply matrix multiplication and other operations, using the same method as Rowmaxtrix}
2. Use of Blockmatrix

The chunking matrix divides a matrix into blocks, for example:

It can be divided into four pieces.

Thus the matrix P has the following form

More relevant content of the block matrix includes the transpose of the block matrix and the multiplication of the block matrix. See Https://en.wikipedia.org/wiki/Block_matrix

Package CN. ML. Datastructimport org. Apache. Spark. Mllib. Linalg. Distributed. Blockmatriximport org. Apache. Spark. Mllib. Linalg. Distributed. Coordinatematriximport org. Apache. Spark. Mllib. Linalg. Distributed. Matrixentryimport org. Apache. Spark. Mllib. Linalg. Distributed. Indexedrowmatriximport org. Apache. Spark. Sparkcontextimport org. Apache. Spark. Mllib. Linalg. Distributed. Indexedrowimport org. Apache. Spark. Mllib. Linalg. Vectorsimport org. Apache. Spark. SparkconfObject Blockmatrixdemo extends App {val sparkconf = new sparkconf (). Setappname("Blockmatrixdemo"). Setmaster("spark://sparkmaster:7077")//here refers to running locally,2A thread val sc = new Sparkcontext (sparkconf) implicit def double2long (x:D ouble) =x. TolongVal rdd1= SC. Parallelize(Array (1.0,20.0,30.0,40.0), Array (2.0,50.0,60.0,70.0), Array (3.0,80.0,90.0,100.0)          )      ). Map(f = Indexedrow (f. Take(1)(0), Vectors. Dense(f. Drop(1))) Val Indexrowmatrix = new Indexedrowmatrix (RDD1)//convert Indexedrowmatrix to Blockmatrix, specify the number of rows per block Val Blockmatrix:bloc Kmatrix=indexrowmatrix. Toblockmatrix(2,2)//After the execution of the printed content://index: (0,0) Matrixcontent:2 x 2Cscmatrix//(1,0)20.0//(1,1)30.0Index: (1,1) Matrixcontent:2 x 1Cscmatrix//(0,0)70.0//(1,0)100.0Index: (1,0) Matrixcontent:2 x 2Cscmatrix//(0,0)50.0//(1,0)80.0//(0,1)60.0//(1,1)90.0Index: (0,1) Matrixcontent:2 x 1Cscmatrix//(1,0)40.0From the printed content can be seen: each block matrix using the sparse matrix CSC format Storage Blockmatrix. Blocks. foreach(F=>println ("Index:"+f._1+"Matrixcontent:"+f._2))//conversion cost to matrix//0.0   0.0   0.0//20.0  30.0  40.0//50.0  60.0  70.0//80.0  90.0  100.0As can be seen from the converted content, the Indexrowmatrix. Toblockmatrix(2,2)//operation, when the specified number of rows does not match the actual matrix content, the corresponding 0-value padding is made Blockmatrix. Tolocalmatrix()//block matrix addition Blockmatrix. Add(Blockmatrix)//block matrix multiplication blockmatrix*blockmatrix^t (T means transpose) Blockmatrix. Multiply(Blockmatrix. Transpose)//Convert to Coordinatematrix Blockmatrix. Tocoordinatematrix()//convert to Indexedrowmatrix Blockmatrix. Toindexedrowmatrix()//Verify the legitimacy of the block matrix Blockmatrix. Validate()}

Copyright NOTICE: This article for Bo Master original article, without Bo Master permission not reproduced.

Machine learning on spark--section II: Basic data Structure (II)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.