The method of realizing matrix multiplication by mapreduce

Source: Internet
Author: User
Tags shuffle first row

The problem of matrix multiplication is often encountered in large data calculation, so MapReduce realizes matrix multiplication is an important basic knowledge, I try to describe the algorithm in popular language below.

1. First of all, review matrix multiplication Foundation

Matrices A and B can be multiplied on the premise that A has the same number of rows as B, because the result of each element in the matrix C of the multiplication result is CIJ, which is the first row of A and the J column of B to do dot product, see the following figure:

2. Get to the point

After understanding the matrix multiplication rule, we intend to use the distributed computing model MapReduce to complete this process.

The MR Process is performed simultaneously on multiple machines in the Hadoop cluster, so the computation of the Mr can must be a process without relationship and independence. By analyzing the above matrix multiplication process, we can find that the calculation process of each element of the C matrix is independent of each other, for example, the computation of C11 and C21 does not affect each other, and can be carried out simultaneously.

So, our goal is to convert every C-matrix element CIJ by Mr.

More Wonderful content: http://www.bianceng.cnhttp://www.bianceng.cn/Programming/sjjg/

In view of the above goal, we further analyze that CIJ is actually the first row of the A matrix and the dot product of the J column of the b matrix, so we can calculate the CIJ as long as we can end up with all the elements involved in the calculation of the CIJ (the first row of the A matrix and the J column of the b matrix) grouped into one group to participate in the calculation. This so-called "fall to a group", the combination of Mr Model and matrix multiplication rule is that map will output these elements to the same key---c matrix elements of the coordinates, and then through shuffle can be all the same key elements into reduce, by reduce to do dot product operations, The final value of the C element is obtained.

OK, the above ideas are understood, we go back to the input data, which is a and B two matrices, we just need to deal with each element in the matrix (the process needs to be done in map), according to each element will be involved in which CIJ calculation, for each element to play (i,j) coordinates, In this way, these elements are then shuffle to the target CIJ group of calculated data sources.

Specific examples, A12, will participate in C11,C12 calculations; B22 will participate in C12,C22 calculations. So, from the element coordinates of a and B, we are fully aware of the coordinates of the C element that they are about to participate in the calculation. Note that here is a one-to-many, each element A or B will participate in the calculation of multiple C elements, if you do not understand, please look at the first time matrix multiplication rules.

Through the above analysis, for a I row J column of a Matrix, and J row k column B matrix multiplication:

We treat each AIJ element in the following format:

Key=i,n (N=1,2,3...K) value= ' A ', ' J ', AIJ

We treat each BJK as the following format:

key= m,k (m=1,2,3...i) value= ' B ',

The above format may be a lot of people see pain, I would like to nag two, take A12 to give examples, see the following figure:

A12 will eventually participate in C11,C12 calculations, so we need to process A12 with two {key,value} pairs:

{(1,1), (' A ', 2, 2}}/* (1,1) is the coordinates of the C11 that A12 will participate in the calculation; ' A ' represents the data coming from a matrix because A and b need to be multiplied so that a sign bit is required; The first 2 represents this is the coordinate of the a vector when calculating C11, because you know a Multiplies the first elements of a vector and the first elements of a b vector; the last 2 is the value of the current element.

{(1,2), (' A ', 2, 2)}/* Refer to the above description * *

So explain all can not understand, on their own wall to go to ha!

At the end of the Ok,map process, all the A and B elements involved in CIJ are shuffle to the same reduce, and reduce's algorithm is simple to create an array by distinguishing the data source (a or B) from the marker bit, and then the two arrays can be dot product.

Author: csdn Blog u010967382

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.