A mapreduce implementation of matrix multiplication

Last Update:2017-01-07 Source: Internet

Author: User

Tags hadoop fs

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

For arbitrary matrices M and N, if the number of columns of matrix M equals the number of rows of the matrix n, then the product of M and N is P=m*n, where Mik is the row I and K of Matrix M, and NKJ is the K and J columns of Matrix N, then the elements of column J of line I in matrix P are represented as formulas (1-1):

pij= (m*n) ij=∑miknkj=mi1*n1j+mi2*n2j+......+mik*nkj (equation 1-1)

As can be seen from the formula (1-1), the final decision Pij is (I,J), so it can be used as the input key value of reducer. In order to find out the Pij need to know Mik and nkj, for the Mik, its required properties are matrix M, the number of rows I, the number of columns K, and its own numerical size mik; also for NKJ, its required properties are matrix N, the number of rows K, the number of columns J, and its own value size nkj, These attribute values can be processed by mapper

Map function: For each element in matrix M Mik, produces a series of key-value to < (I,J), (M,k,mik), wherein, k=1,2 ... Until the total number of columns of matrix n, for each element of matrix N nkj, produces a series of Key-value to,< (i,j), (N,K,NKJ), where i=1,2 ... The total number of rows until the matrix M

Reduce function: For each key (I,J) associated with the value (M,k,mik) and (N,K,NKJ), according to the same K value Mik and nkj respectively into a different array, and then the k elements of the two are extracted by multiplying, and then accumulate, you can get the value of PIJ

There are m and n two files that hold two matrices, each line of the file content is in the form of "line number, column number \ t element value", in this case, using shell script to generate data

Code 1-2

[Email protected]:/data# cat Matrix #!/bin/bashfor i in ' seq 1 $ ' Do for        j in ' seq 1 "do                s=$ ($RANDOM% 100 )                echo-e "$i, $j \t$s" >> m_$1_$2        donedonefor i in ' seq 1 $ "Do for        J in ' seq 1 $ ' Do                s=$ ($RAN dom%100)                echo-e "$i, $j \t$s" >> n_$2_$3        Donedone

Code 1-3, execute the matrix script, generate a 2 rows of 3 columns and 3 rows of 3 columns, and create a new data folder under HDFs, the resulting two matrix into the data folder

Code 1-3

[Email protected]:/data#./matrix 2 3 3[email protected]:/data# cat M_2_3 the 61,2 841,3 402,1 512,2     372,3     97[email protected]:/data# cat n_3_3 to     971,2     341,3 952,1 932,2 102,3 703,1     713,2     243,3     47[email protected]:/data# hadoop fs-mkdir/data[email protected]:/data# Hadoop FS- Put/data/m_2_3/data/[email protected]:/data# Hadoop fs-put/data/n_3_3/data/[email protected]:/data# Hadoop fs-ls-r /data-rw-r--r--   1 root supergroup         2017-01-07 11:57/data/m_2_3-rw-r--r--   1 root supergroup         63 2017-01-07 11:57/data/n_3_3

Matrix multiplication Mapper class program such as code 1-4

Code 1-4

Package Com.hadoop.mapreduce;import Java.io.ioexception;import Org.apache.hadoop.conf.configuration;import Org.apache.hadoop.io.longwritable;import Org.apache.hadoop.io.text;import Org.apache.hadoop.mapreduce.Mapper; Import Org.apache.hadoop.mapreduce.lib.input.filesplit;public class Matrixmapper extends Mapper<longwritable, Text, text, text> {private int columnn = 0;private int rowm = 0;private text Mapkey = new text ();p rivate text Mapvalue = new Text ();p rotected void Setup (context context) throws IOException, interruptedexception {Configuration conf = Context. GetConfiguration (); columnn = Integer.parseint (Conf.get ("ColumnN")); rowm = Integer.parseint (Conf.get ("RowM"));};  protected void Map (longwritable key, Text value, Context context) throws IOException, interruptedexception {filesplit file = (filesplit) context.getinputsplit (); String fileName = File.getpath (). GetName (); String line = value.tostring (); string[] tuple = Line.split (","); if (tuple.length! = 2) {throw new RuntimeexceptiOn ("Matrixmapper tuple Error");} int row = Integer.parseint (tuple[0]); string[] tuples = tuple[1].split ("\ t"), if (tuples.length! = 2) {throw new RuntimeException ("Matrixmapper tuples Error");} if (Filename.contains ("M")) {MATRIXM (row, Integer.parseint (Tuples[0]), Integer.parseint (Tuples[1]), context);} else { Matrixn (Row, Integer.parseint (Tuples[0]), Integer.parseint (Tuples[1]), context);}};  private void Matrixm (int row, int column, int value, context context) throws IOException, interruptedexception {for (int i = 1; I < COLUMNN + 1; i++) {mapkey.set (row + "," + i) mapvalue.set ("M," + Column + "," + Value ") Context.write (Mapkey, Mapvalue);}}  private void Matrixn (int row, int column, int value, context context) throws IOException, interruptedexception {for (int i = 1; I < ROWM + 1; i++) {Mapkey.set (i + "," + column "), Mapvalue.set (" N, "+ Row +", "+ value"), Context.write (Mapkey, Mapvalue);}}

Matrix multiplication Reducer Class program such as code 1-5

Code 1-5

Package Com.hadoop.mapreduce;import Java.io.ioexception;import Org.apache.hadoop.conf.configuration;import Org.apache.hadoop.io.text;import Org.apache.hadoop.mapreduce.reducer;public class Matrixreducer extends Reducer< text, text, text, text> {private int columnm = 0;protected void Setup (context context) throws IOException, Interruptede xception {Configuration conf = context.getconfiguration (); columnm = Integer.parseint (Conf.get ("columnm"));}; protected void reduce (Text key, iterable<text> values, context context) throws IOException, interruptedexception {i NT sum = 0;int[] m = new INT[COLUMNM + 1];int[] n = new INT[COLUMNM + 1];for (Text val:values) {string[] tuple = Val.tos Tring (). Split (","), if (tuple.length! = 3) {throw new RuntimeException ("Matrixreducer tuple Error");} if ("M". Equals (Tuple[0])) {m[integer.parseint (tuple[1])] = Integer.parseint (tuple[2]);} else {n[integer.parseint ( TUPLE[1])] = Integer.parseint (tuple[2]);}} for (int i = 1; i < COLUMNM + 1; i++) {sum + = m[i] * n[i];} Context.write (Key, new Text (sum + ""));};}

Matrix multiplication main function such as code 1-5

Package Com.hadoop.mapreduce;import Java.io.ioexception;import Org.apache.hadoop.conf.configuration;import Org.apache.hadoop.fs.path;import Org.apache.hadoop.io.text;import Org.apache.hadoop.mapreduce.job;import Org.apache.hadoop.mapreduce.lib.input.fileinputformat;import Org.apache.hadoop.mapreduce.lib.output.fileoutputformat;public class Matrix {public static void main (string[] args) Throws IOException, ClassNotFoundException, interruptedexception {if (args = = NULL | | Args.length! = 5) {throw new Runtime Exception ("Enter the input path, output path, number of rows of matrix m, number of columns of matrix m, number of columns of Matrix n");} Configuration conf = new configuration (), Conf.set ("Rowm", args[2]), Conf.set ("COLUMNM", Args[3]), Conf.set ("ColumnN", ARGS[4]); Job Job = job.getinstance (conf); Job.setjobname ("Matrix"); Job.setjarbyclass (Matrix.class); Job.setmapperclass ( Matrixmapper.class); Job.setreducerclass (Matrixreducer.class); Job.setoutputkeyclass (Text.class); Job.setoutputvalueclass (Text.class); Fileinputformat.addinputpaths (Job, args[0]); Fileoutputformat.setoutputpath (JoB, New Path (Args[1]); System.exit (Job.waitforcompletion (true)? 0:1);}}

The code 1-3, 1-4, 1-5 into the Matrix.jar package, and then run, run the results such as code 1-6 ( Note: code 1-6 omitted partial mapreduce execution content )

Code 1-6

[Email protected]:/data# hadoop jar Matrix.jar com.hadoop.mapreduce.matrix/data//OUTPUT/2 3 3 ..... [Email protected]:/data# hadoop fs-ls-r/output-rw-r--r--   1 root supergroup          0 2017-01-07 12:04/output/_success -rw-r--r--   1 root supergroup         2017-01-07 12:04/output/part-r-00000[email protected]:/data# Hadoop fs-cat/ output/part-r-000001,1     112341,2     20041,3     83302,1     152752,2     44322,3     11994

A mapreduce implementation of matrix multiplication

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More