A mapreduce implementation of matrix multiplication

Source: Internet
Author: User
Tags hadoop fs

For arbitrary matrices M and N, if the number of columns of matrix M equals the number of rows of the matrix n, then the product of M and N is P=m*n, where Mik is the row I and K of Matrix M, and NKJ is the K and J columns of Matrix N, then the elements of column J of line I in matrix P are represented as formulas (1-1):

pij= (m*n) ij=∑miknkj=mi1*n1j+mi2*n2j+......+mik*nkj (equation 1-1)

As can be seen from the formula (1-1), the final decision Pij is (I,J), so it can be used as the input key value of reducer. In order to find out the Pij need to know Mik and nkj, for the Mik, its required properties are matrix M, the number of rows I, the number of columns K, and its own numerical size mik; also for NKJ, its required properties are matrix N, the number of rows K, the number of columns J, and its own value size nkj, These attribute values can be processed by mapper

Map function: For each element in matrix M Mik, produces a series of key-value to < (I,J), (M,k,mik), wherein, k=1,2 ... Until the total number of columns of matrix n, for each element of matrix N nkj, produces a series of Key-value to,< (i,j), (N,K,NKJ), where i=1,2 ... The total number of rows until the matrix M

Reduce function: For each key (I,J) associated with the value (M,k,mik) and (N,K,NKJ), according to the same K value Mik and nkj respectively into a different array, and then the k elements of the two are extracted by multiplying, and then accumulate, you can get the value of PIJ

There are m and n two files that hold two matrices, each line of the file content is in the form of "line number, column number \ t element value", in this case, using shell script to generate data

Code 1-2

[Email protected]:/data# cat Matrix #!/bin/bashfor i in ' seq 1 $ ' Do for        j in ' seq 1 "do                s=$ ($RANDOM% 100 )                echo-e "$i, $j \t$s" >> m_$1_$2        donedonefor i in ' seq 1 $ "Do for        J in ' seq 1 $ ' Do                s=$ ($RAN dom%100)                echo-e "$i, $j \t$s" >> n_$2_$3        Donedone

Code 1-3, execute the matrix script, generate a 2 rows of 3 columns and 3 rows of 3 columns, and create a new data folder under HDFs, the resulting two matrix into the data folder

Code 1-3

[Email protected]:/data#./matrix 2 3 3[email protected]:/data# cat M_2_3 the 61,2 841,3 402,1 512,2     372,3     97[email protected]:/data# cat n_3_3 to     971,2     341,3 952,1 932,2 102,3 703,1     713,2     243,3     47[email protected]:/data# hadoop fs-mkdir/data[email protected]:/data# Hadoop FS- Put/data/m_2_3/data/[email protected]:/data# Hadoop fs-put/data/n_3_3/data/[email protected]:/data# Hadoop fs-ls-r /data-rw-r--r--   1 root supergroup         2017-01-07 11:57/data/m_2_3-rw-r--r--   1 root supergroup         63 2017-01-07 11:57/data/n_3_3

Matrix multiplication Mapper class program such as code 1-4

Code 1-4

Package Com.hadoop.mapreduce;import Java.io.ioexception;import Org.apache.hadoop.conf.configuration;import Org.apache.hadoop.io.longwritable;import Org.apache.hadoop.io.text;import Org.apache.hadoop.mapreduce.Mapper; Import Org.apache.hadoop.mapreduce.lib.input.filesplit;public class Matrixmapper extends Mapper<longwritable, Text, text, text> {private int columnn = 0;private int rowm = 0;private text Mapkey = new text ();p rivate text Mapvalue = new Text ();p rotected void Setup (context context) throws IOException, interruptedexception {Configuration conf = Context. GetConfiguration (); columnn = Integer.parseint (Conf.get ("ColumnN")); rowm = Integer.parseint (Conf.get ("RowM"));};  protected void Map (longwritable key, Text value, Context context) throws IOException, interruptedexception {filesplit file = (filesplit) context.getinputsplit (); String fileName = File.getpath (). GetName (); String line = value.tostring (); string[] tuple = Line.split (","); if (tuple.length! = 2) {throw new RuntimeexceptiOn ("Matrixmapper tuple Error");} int row = Integer.parseint (tuple[0]); string[] tuples = tuple[1].split ("\ t"), if (tuples.length! = 2) {throw new RuntimeException ("Matrixmapper tuples Error");} if (Filename.contains ("M")) {MATRIXM (row, Integer.parseint (Tuples[0]), Integer.parseint (Tuples[1]), context);} else { Matrixn (Row, Integer.parseint (Tuples[0]), Integer.parseint (Tuples[1]), context);}};  private void Matrixm (int row, int column, int value, context context) throws IOException, interruptedexception {for (int i = 1; I < COLUMNN + 1; i++) {mapkey.set (row + "," + i) mapvalue.set ("M," + Column + "," + Value ") Context.write (Mapkey, Mapvalue);}}  private void Matrixn (int row, int column, int value, context context) throws IOException, interruptedexception {for (int i = 1; I < ROWM + 1; i++) {Mapkey.set (i + "," + column "), Mapvalue.set (" N, "+ Row +", "+ value"), Context.write (Mapkey, Mapvalue);}}

Matrix multiplication Reducer Class program such as code 1-5

Code 1-5

Package Com.hadoop.mapreduce;import Java.io.ioexception;import Org.apache.hadoop.conf.configuration;import Org.apache.hadoop.io.text;import Org.apache.hadoop.mapreduce.reducer;public class Matrixreducer extends Reducer< text, text, text, text> {private int columnm = 0;protected void Setup (context context) throws IOException, Interruptede xception {Configuration conf = context.getconfiguration (); columnm = Integer.parseint (Conf.get ("columnm"));}; protected void reduce (Text key, iterable<text> values, context context) throws IOException, interruptedexception {i NT sum = 0;int[] m = new INT[COLUMNM + 1];int[] n = new INT[COLUMNM + 1];for (Text val:values) {string[] tuple = Val.tos Tring (). Split (","), if (tuple.length! = 3) {throw new RuntimeException ("Matrixreducer tuple Error");} if ("M". Equals (Tuple[0])) {m[integer.parseint (tuple[1])] = Integer.parseint (tuple[2]);} else {n[integer.parseint ( TUPLE[1])] = Integer.parseint (tuple[2]);}} for (int i = 1; i < COLUMNM + 1; i++) {sum + = m[i] * n[i];} Context.write (Key, new Text (sum + ""));};}

Matrix multiplication main function such as code 1-5

Package Com.hadoop.mapreduce;import Java.io.ioexception;import Org.apache.hadoop.conf.configuration;import Org.apache.hadoop.fs.path;import Org.apache.hadoop.io.text;import Org.apache.hadoop.mapreduce.job;import Org.apache.hadoop.mapreduce.lib.input.fileinputformat;import Org.apache.hadoop.mapreduce.lib.output.fileoutputformat;public class Matrix {public static void main (string[] args) Throws IOException, ClassNotFoundException, interruptedexception {if (args = = NULL | | Args.length! = 5) {throw new Runtime Exception ("Enter the input path, output path, number of rows of matrix m, number of columns of matrix m, number of columns of Matrix n");} Configuration conf = new configuration (), Conf.set ("Rowm", args[2]), Conf.set ("COLUMNM", Args[3]), Conf.set ("ColumnN", ARGS[4]); Job Job = job.getinstance (conf); Job.setjobname ("Matrix"); Job.setjarbyclass (Matrix.class); Job.setmapperclass ( Matrixmapper.class); Job.setreducerclass (Matrixreducer.class); Job.setoutputkeyclass (Text.class); Job.setoutputvalueclass (Text.class); Fileinputformat.addinputpaths (Job, args[0]); Fileoutputformat.setoutputpath (JoB, New Path (Args[1]); System.exit (Job.waitforcompletion (true)? 0:1);}}

The code 1-3, 1-4, 1-5 into the Matrix.jar package, and then run, run the results such as code 1-6 ( Note: code 1-6 omitted partial mapreduce execution content )

Code 1-6

[Email protected]:/data# hadoop jar Matrix.jar com.hadoop.mapreduce.matrix/data//OUTPUT/2 3 3 ..... [Email protected]:/data# hadoop fs-ls-r/output-rw-r--r--   1 root supergroup          0 2017-01-07 12:04/output/_success -rw-r--r--   1 root supergroup         2017-01-07 12:04/output/part-r-00000[email protected]:/data# Hadoop fs-cat/ output/part-r-000001,1     112341,2     20041,3     83302,1     152752,2     44322,3     11994

A mapreduce implementation of matrix multiplication

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.