Mapreduce Algorithm for Matrix Multiplication

Source: Internet
Author: User

Mapreduce Algorithm for Matrix Multiplication

In big data computing, matrix multiplication is often encountered. Therefore, implementing matrix multiplication in Mapreduce is an important basic knowledge. I will describe this algorithm in plain language as much as possible.

1. First, review the basis of Matrix Multiplication

The premise that A and B can be multiplied is that the number of columns of A is the same as the number of rows of B, because every element in matrix C in the multiplication result is Cij, is the result of the dot product operation on column j of row I and B of A. See:

2. Go to the topic

After learning about the Matrix Multiplication rules, we plan to use the distributed computing model Mapreduce to complete this process.

The MR process is performed on multiple machines in the Hadoop cluster at the same time. Therefore, MR-based computing must be independent from each other. By analyzing the above matrix multiplication process, we can find that, in fact, the calculation process of each element in the C matrix is independent of each other. For example, the calculation of C11 and C21 does not affect each other and can be performed simultaneously.

Therefore, our goal is to calculate every C matrix element Cij through MR.

Based on the above analysis, Cij is actually the dot product of row I of matrix A and column j of matrix B, therefore, as long as all the elements involved in the calculation of Cij (column j of row I and column j of matrix B of matrix A) are grouped into A group to participate in the calculation, the Cij can be calculated. The so-called "grouping to a group", combined with the MR model and matrix multiplication rules, is actually the Map outputs these elements as the coordinates of the elements in the same Key-C matrix, then, Shuffle is used to input all elements with the same Key to Reduce, and Reduce is used to perform dot product operations to obtain the final value of the C element.

OK. After reading all the above ideas, we will return to the input data, that is, the two matrices A and B, we only need to process each element in the matrix (this process needs to be performed in Map), and add (I, j) to each element based on the Cij involved in the calculation) then, these elements are shuffled to the computing data source Group of the target Cij.

For example, A12 is involved in the calculation of C11 and C12, and B22 is involved in the calculation of C12 and C22. Therefore, we can see from the element coordinates of A and B that they will be involved in the calculation of the coordinates of the C element. Note: Here is one-to-multiple. Each A or B element is involved in the calculation of multiple C elements. If you do not understand it, see the first matrix multiplication rule.

  • Through the above analysis, for A matrix A of column j in line I, and B Matrix Multiplication of column k in row j:
  • We process each Aij element in the following format:
  • Key = I, n (n = 1, 2, 3... k) value = 'A', 'J', aij
  • We process each Bjk in the following format:
  • Key = m, k (m = 1, 2, 3... I) value = 'B ',

The above format may be difficult for many people to see. I will give another question and take A12 as an example. For more information, see:

A12 will eventually participate in the calculation of C11 and C12, so we need to process A12 as two {key, value} pairs:

{(), ('A', 2, 2)}/* () is the coordinate of the C11 to be calculated by A12. 'A' indicates that the data comes from the matrix, because A and B need to be multiplied, A flag is required. The first two represent the coordinates of A vector corresponding to C11, because we need to know that the first element of vector A is multiplied by the first element of vector B. The last 2 is the value of the current element */

{(1, 2), ('A', 2, 2)}/* refer to the preceding description */

If you don't understand this explanation, go to the front!

OK. The Map process is over. All the and B elements involved in Cij are shuffled to the same Reduce. The Reduce algorithm is simple, use A flag to distinguish the data source (A or B) to create an array, and then use the two arrays as the dot product.

Build a Hadoop environment on Ubuntu 13.04

Cluster configuration for Ubuntu 12.10 + Hadoop 1.2.1

Build a Hadoop environment on Ubuntu (standalone mode + pseudo Distribution Mode)

Configuration of Hadoop environment in Ubuntu

Detailed tutorial on creating a Hadoop environment for standalone Edition

Build a Hadoop environment (using virtual machines to build two Ubuntu systems in a Winodws environment)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.