Mahout source code meanshiftcanopydriver Analysis 3 meanshiftcanopyreducer data logic Flow

Last Update:2018-12-04 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

First, paste the imitation code of meanshiftcanopyreducer, as follows:

Package mahout. fansy. meanshift; import Java. io. ioexception; import Java. util. collection; import Java. util. hashmap; import Java. util. map; import Org. apache. hadoop. conf. configuration; import Org. apache. hadoop. FS. filesystem; import Org. apache. hadoop. FS. path; import Org. apache. hadoop. io. text; import Org. apache. mahout. clustering. iterator. clusterwritable; import Org. apache. mahout. clustering. meanshift. meanshiftcanopy; import Org. apache. mahout. clustering. meanshift. meanshiftcanopyclusterer; import Org. apache. mahout. clustering. meanshift. meanshiftcanopyconfigkeys; import COM. google. common. collect. lists; public class extends {/*** meanshiftcanopyreducer counterfeit code * @ author fansy * @ Param ARGs * // Private Static int convergedclusters = 0; Private Static Boolean allconverged = true; public static void main (string [] ARGs) {// cleanup (); // debug the cleanup function reduce (); // debug reduce function}/*** reduce operation imitation */public static Map <text, collection <clusterwritable> reduce () {collection <meanshiftcanopy> canopies = lists. newarraylist (); // obtain the map output collection <clusterwritable> values = meanshiftcanopymapperfollow. cleanup (). get (new text ("0"); meanshiftcanopyclusterer clusterer = setup (); collection <clusterwritable> V = lists. newarraylist (); For (clusterwritable: values) {meanshiftcanopy canopy = (meanshiftcanopy) clusterwritable. getvalue (); clusterer. mergecanopy (canopy. shallowcopy (), canopies);} Map <text, collection <clusterwritable> map = new hashmap <text, collection <clusterwritable> (); For (meanshiftcanopy canopy: canopies) {Boolean converged = clusterer. shifttomean (canopy); If (converged) {// system. out. println ("clustering" + "converged clusters" + convergedclusters ++);} allconverged = converged & allconverged; clusterwritable = new clusterwritable (); clusterwritable. setvalue (canopy); V. add (clusterwritable); map. put (new text (canopy. getidentifier (), V); // system. out. println ("key:" + canopy. getidentifier () + ", value:" + clusterwritable. getvalue (). tostring ();} // map. put (new text (canopy. getidentifier (), V); Return map;}/*** counterfeit setup function, directly call the mapperfollow Method * @ return to return the meanshiftcanopyclusterer */public static meanshiftcanopyclusterer setup () {return meanshiftcanopymapperfollow. setup ();}/*** counterfeit cleanup function * @ throws ioexception */public static void cleanup () throws ioexception {// int num1_cers = 1; // set it by yourself, here, we set it to 1 configuration conf = getconf (); // to determine whether all of them meet the criterion threshold. If yes, we create a new file. If (allconverged) {Path = New Path (Conf. get (meanshiftcanopyconfigkeys. control_path_key); filesystem. get (path. touri (), conf ). createnewfile (PATH) ;}}/*** obtain the configured configuration * @ return */public static configuration getconf () {string measureclassname = "org. apache. mahout. common. distance. euclideandistancemeasure "; string kernelprofileclassname =" org. apache. mahout. common. kernel. triangularkernelprofile "; double convergencedelta = 0.5; double T1 = 47.6; double t2 = 1; Boolean runclustering = true; configuration conf = new configuration (); Conf. set (meanshiftcanopyconfigkeys. distance_measure_key, measureclassname); Conf. set (meanshiftcanopyconfigkeys. kernel_profile_key, kernelprofileclassname); Conf. set (meanshiftcanopyconfigkeys. cluster_convergence_key, String. valueof (convergencedelta); Conf. set (meanshiftcanopyconfigkeys. t1_key, String. valueof (T1); Conf. set (meanshiftcanopyconfigkeys. t2_key, String. valueof (T2); Conf. set (meanshiftcanopyconfigkeys. cluster_points_key, String. valueof (runclustering); Return conf;}/*** get the map output data, that is, canopies * @ return Map <text, clusterwritable> canpies */public static Map <text, collection <clusterwritable> getmapdata () {return meanshiftcanopymapperfollow. cleanup ();}}

The setup function is the same as that in mapper. The cleanup function is only used to create a new function when the threshold value is met. Here we will not talk about it more. It mainly analyzes the reduce function (in fact, the main code is similar to the map + cleanup function in Mapper ).

The first three records output by map are as follows:

MSC-0{n=100 c=[29.942, 30.443, 30.325, 30.018, 29.887, 29.777, 29.855, 29.883, 30.128, 29.984, 29.796, 29.845, 30.436, 29.729, 29.890, 29.518, 29.546, 30.052, 30.077, 30.001, 29.837, 29.928, 30.288, 30.347, 29.785, 29.799, 29.651, 30.008, 29.938, 30.104, 29.997, 29.684, 29.949, 29.754, 30.272, 30.106, 29.883, 30.221, 29.847, 29.848, 29.843, 30.577, 29.870, 29.785, 29.923, 29.864, 30.184, 29.977, 30.321, 30.068, 30.570, 30.224, 30.240, 29.969, 30.246, 30.544, 29.862, 30.099, 29.907, 30.169] r=[3.384, 3.383, 3.494, 3.523, 3.308, 3.605, 3.315, 3.518, 3.472, 3.519, 3.350, 3.444, 3.273, 3.274, 3.400, 3.443, 3.426, 3.499, 3.154, 3.506, 3.509, 3.436, 3.484, 3.475, 3.360, 3.164, 3.460, 3.491, 3.608, 3.484, 3.477, 3.748, 3.628, 3.378, 3.327, 3.600, 3.455, 3.562, 3.534, 3.566, 3.213, 3.645, 3.615, 3.274, 3.197, 3.373, 3.595, 3.452, 3.609, 3.518, 3.262, 3.477, 3.755, 3.830, 3.494, 3.676, 3.423, 3.491, 3.641, 3.374]}

MSC-1{n=101 c=[29.890, 30.422, 30.280, 30.046, 29.891, 29.805, 29.828, 29.875, 30.133, 30.035, 29.773, 29.900, 30.441, 29.751, 29.906, 29.490, 29.508, 30.013, 30.082, 30.049, 29.815, 29.934, 30.286, 30.294, 29.828, 29.831, 29.712, 30.005, 29.977, 30.128, 30.015, 29.675, 29.963, 29.766, 30.259, 30.095, 29.855, 30.139, 29.704, 29.797, 29.808, 30.530, 29.743, 29.745, 29.883, 29.741, 30.140, 29.935, 30.271, 29.934, 30.437, 30.184, 30.180, 29.823, 30.146, 30.494, 29.767, 30.061, 29.854, 30.130] r=[3.407, 3.373, 3.506, 3.517, 3.292, 3.598, 3.310, 3.502, 3.455, 3.538, 3.341, 3.471, 3.257, 3.265, 3.387, 3.437, 3.430, 3.504, 3.139, 3.522, 3.499, 3.419, 3.466, 3.497, 3.371, 3.165, 3.496, 3.474, 3.610, 3.475, 3.464, 3.730, 3.613, 3.363, 3.313, 3.584, 3.449, 3.639, 3.797, 3.585, 3.215, 3.658, 3.818, 3.282, 3.205, 3.573, 3.605, 3.460, 3.626, 3.748, 3.507, 3.482, 3.784, 4.079, 3.616, 3.692, 3.535, 3.495, 3.663, 3.380]}

MSC-2{n=100 c=[29.942, 30.443, 30.325, 30.018, 29.887, 29.777, 29.855, 29.883, 30.128, 29.984, 29.796, 29.845, 30.436, 29.729, 29.890, 29.518, 29.546, 30.052, 30.077, 30.001, 29.837, 29.928, 30.288, 30.347, 29.785, 29.799, 29.651, 30.008, 29.938, 30.104, 29.997, 29.684, 29.949, 29.754, 30.272, 30.106, 29.883, 30.221, 29.847, 29.848, 29.843, 30.577, 29.870, 29.785, 29.923, 29.864, 30.184, 29.977, 30.321, 30.068, 30.570, 30.224, 30.240, 29.969, 30.246, 30.544, 29.862, 30.099, 29.907, 30.169] r=[3.384, 3.383, 3.494, 3.523, 3.308, 3.605, 3.315, 3.518, 3.472, 3.519, 3.350, 3.444, 3.273, 3.274, 3.400, 3.443, 3.426, 3.499, 3.154, 3.506, 3.509, 3.436, 3.484, 3.475, 3.360, 3.164, 3.460, 3.491, 3.608, 3.484, 3.477, 3.748, 3.628, 3.378, 3.327, 3.600, 3.455, 3.562, 3.534, 3.566, 3.213, 3.645, 3.615, 3.274, 3.197, 3.373, 3.595, 3.452, 3.609, 3.518, 3.262, 3.477, 3.755, 3.830, 3.494, 3.676, 3.423, 3.491, 3.641, 3.374]}

After completing the preparations, you can directly access clusterer. mergecanopy (canopy. shallowcopy (), canopies); the analysis here is the same as that in the previous Mapper. The input in the first line is the same, but the input in the second line is different, if the norm of the input in the second line and canopies (1) is 0.44 <t1 and 0.44 <t2, enter here:

if (norm < t2 && (closestCoveringCanopy == null || norm < closestNorm)) {        closestNorm = norm;        closestCoveringCanopy = canopy;      }

Then you should enter else, instead of if, as shown below:

 if (closestCoveringCanopy == null) {      canopies.add(aCanopy);    } else {      closestCoveringCanopy.merge(aCanopy, runClustering);    }

Here, merge only refers to the corresponding canopies (1) (here 1 may be another number. For the previous data, here it is all 1) boundpoints and mass values, for example, if canopies (1) has three values, mass is 4 and boundpoints is [0, 1, 2, 3]. Here the Merge function is better understood.

Now return to the previous touch function, I think this function is not very easy to understand; the detailed code of this function is as follows:

void touch(MeanShiftCanopy canopy, double weight) {    canopy.observe(getCenter(), weight * mass);    observe(canopy.getCenter(), weight * canopy.mass);  }

Call method: acanopy. Touch (canopy, weight); where acanopy is one of the input records, and canopy is one of canopies (I;

For the two operations in the above Code, S0, S1, and S2 are actually set: In the first sentence, S0 + 1 of canopy is set, because mass of acanopy is always 1, therefore, the S1 of canopy refers to the center of the current S1 + acanopy. (The computing of S2 is a bit complicated. It is similar to that of S1. it is only a complex computing point ); in the second sentence, the S1 of acanopy (its S0, S1, and S2 are empty) is worth it through the mass of canopies (1) center value * canopies (1, set the S0 value to the mass value;

In fact, I understand the S0, S1, and S2 of canopies (1) settings above, because they will be used later, but why should I set S0, S1, and S2 of acanopy? It means you don't understand it. The location where acanopy is used later is only the Merge function. This function only uses the boundpoints and mass values of acanopy and does not use S0, S1, and S2, so it is not very understandable here. Well, it seems that acanopy is also used in the add method. It is useful when closescoveringcanopy is null, that is, to create a canopies (I.

In this way, 479 values can be output in the reduce output, which is the same as the number of reduce outputs obtained in the first loop in the first blog, as shown below:

In this way, the basic analysis of meanshiftcanopydriver is OK.

Sharing, happiness, and growth

Reprinted please indicate the source: http://blog.csdn.net/fansy1990

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Mahout source code meanshiftcanopydriver Analysis 3 meanshiftcanopyreducer data logic Flow

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Mahout source code meanshiftcanopydriver Analysis 3 meanshiftcanopyreducer data logic Flow

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support