What is monoid on earth??Defined:
Monoid (Semi-group: Refer to note 1 translation, continue to use English name below) is a two-dollar operation (+) and a unit element (the original is the identity element) I makes for any x,x+i=i+x=x. Note that it is not like a group, which is mathematically translated as a group, and it has no inverse elements. It can also be said that a semigroup with a unit element (semigroup)
Wow, it's no use. Let's look at some examples and look at a simple definition again.
https://blog.safaribooksonline.com/2013/05/15/monoids-for-programmers-a-scala-example/
1. Integers and additions
Binding law = (a+b) +c = = + (b+c) and unit =>0+n==n+0==n
1. Integers and multiplication
Binding law = (a*b)c==a(b*c) and unit element =>1*n==n*1==n
1. Lists and associations
Combined Law =>list + (list (3,4) +list (5,6)) = = (List +list (3,4)) +list (5,6) ==list (1,2,3,4,5,6) and unit =>list (1) + List () ==list () +list (1) ==list (1)
This looks like any two-dollar operation is monoid. Can we give some counter-examples?
For example the average
AVG (ON)
AVG (10,avg (20,30))!=avg (avg (10,20), 30)
Subtraction!
Originally a semigroup is a monoid in addition to it does not need a unit element, so more inclusive.
The key is that you imagine the two-dollar operational binding requirement, which means you can ignore the order of calculations! It also means that it is easy to calculate concurrently.
A monoid is essentially a protocol that is followed by a particular type. Does this give us any clues to implement Monoids/semigroups in Scala?
Typeclass Implementation Monoids
It is the developer to enforce the associativity rule!!! Trait semigroup[t]{def op (a:t, b:t): t}trait Monoid[t] extends semigroup[t]{def zero:t}123456789
Now that we're familiar with how to implement monoid in Scala, can we achieve intadditionmonoid?
Object monoids{Implicit Object Intadditionmonoid extends monoid[int]{def op (A:int, b:int): Int = a + b def zeru : Int = 0}}123456
Good. Where can we use it now? See how to use in methods such as reduce ...
Trait semigroup[t]{def op (a:t, b:t): t}trait Monoid[t] extends semigroup[t]{def zero:t}implicit object intadditionm Onoid extends monoid[int]{def op (A:int, b:int): Int = a + b def zero:int = 0}val ListA = List (1,3,5,6) def Reducewith Monoid[a] (Seq:seq[a]) (implicit ev:monoid[a]): A = {seq.reduce (Ev.op (_,_))}println (Reducewithmonoid (ListA)) 12345678910111213141516171819
Define more and see how they behave
Trait semigroup[t]{ def op (a: t, b: t): t}trait monoid[t] extends semigroup[t]{ def zero: t}implicit object intadditionmonoid Extends monoid[int]{ def op (a: int, b: int): Int = a + b def zero: int = 0}//we now have must use a class as type parameters are required due to the fact That tuples themselves have classes.//here our goal is to define functionality for tuples that contain monoid abiding typesclass TUPLE2SEMIGROUP[A,B] (implicit sg1: semigroup[a], sg2: semigroup[b) extends Semigroup[(A, b)]{ def op (a: (A, b), b: (A, B)): (A, b) = (Sg1.op (A) _1, b._1), &nbsP;sg2.op (a._2, b._2))}//as we cannot make above an implicit class because that actually does something different (more on this with an aside about pimp my library pattern soon)//Well we can use another feature of implicits which are implicit Conversions. this function provides logic on how to change a Tuple that contains Semigroups and return a SemiGroup of THE TUPLE ITSELFIMPLICIT DEF TUPLE2SEMIGROUP[A,B] (IMPLICIT SG1: SEMIGROUP[A],SG2 : semigroup[b]): semigroup[(A, b)] = { new tuple2semigroup[a,b] () (SG1,SG2)} Val lista = list (+), (3,4), (5,2), (6,9)) Def reducewithmonoid[a] (Seq: seq[a]) ( Implicit ev: semigroup[a]): a = { seq.reduce (Ev.op (_,_))}println (Reducewithmonoid (ListA)) 123456789101112131415161718192021222324252627282930
···
See how to include aggregation logic in the monoid definition. In fact, we can redefine the behavior of the collection object, which means high reuse and high-scale code. Look at one more example, and then we turn to spark.
Looking at one more example, semigroups can be easily applied in merging 2 maps Association keys and summing the values.
Trait semigroup[t]{ def op (a: t, b: t): t}trait monoid[t] extends semigroup[t]{ def zero: t}implicit object intadditionmonoid Extends monoid[int]{ def op (a: int, b: int): Int = a + b def zero: Int = 0}//Here we only need to Assume that the values can form a semigroup as the keys  ARE JUST BEING COMBINED.CLASS MAPSEMIGROUP[K,V] (implicit sg1: semigroup[ V]) extends SemiGroup[Map[K,V]]{ //We are aggregating where the Initial map is one of the maps and we loop through key values of other one and combine. //this way any keys that don ' T&NBSp;appear in the looping map are there already,all keys that appear in both are overwritten def op (IteratingMap: Map[K,V],  STARTINGMAP: MAP[K,V]): map[k,v] = iteratingmap.aggregate (STARTINGMAP) ({ (currentmap: map[k,v], kv: (k,v)) => { val newvalue: v = startingmap.get (kv._1). Map (V => sg1.op (v, kv._2) ). Getorelse (kv._2) currentMap + (Kv._1 -> newvalue) } }, //this is the combine part (if done in parallel, could have two different maps that need to be combined) this assumes that all Keys are already combined.... { (MapOne: Map[K,V],  MAPTWO: MAP[K,V]) => mapOne ++ mapTwo } )} as we cannot make above an implicit class because that actually does something diferent (more on this with an aside about pimp my library pattern soon)//well we can use Another feature of implicits which are implicit conversions. this function provides logic on how to change a Tuple that contains semigroups and return a semigroup of the tuple ITSELFIMPLICIT DEF MAPSEMIGROUP[K,V] (Implicit sg1: semigroup[v]): SemiGroup[Map[K,V] = { new mapsemIGROUP[K,V] (SG1)}val mapa = map ("A" -> 1, "B" -> 2, "D" -> 5) Val mapb = map ("A" -> 3, "C" -> 3, "D" -> 1) Val mapc = map ("B" -> 10, "D" -> 3) def Reducewithmonoid[a] (Seq: seq[a]) (Implicit ev: semigroup[a]): a = { Seq.reduce (Ev.op (_,_))}println (Reducewithmonoid (List (MAPA, MAPB, MAPC))) 12345678910111213141516171819202122232425262728293031323334353637383940414243
Pimp My Library Example and why don't we implicit classes
An implicit class takes a constructor which are the class to be pimpd. You can then define methods etc. which would be ' available ' on this type as though it was native functionality!!!! Implicit class pimpedstring (s:string) {def pimpmystring (): String = S + "is pimped"}println ("My String". pimpmystring ()) 123456
Spark uses Pimp My Library mode to add specific methods that are available only on specific types of rdds. such as: Key Value Pair RDDs
Https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/rdd/PairRDDFunctions.scala
/*** Extra functions available on RDDs of (key, value) pairs through an implicit conversion.*/implicit class Pairrddfuncti Ons[k, V] (self:rdd[(K, V)]) (implicit kt:classtag[k], vt:classtag[v], ord:ordering[k] = null) extends Loggingwith Sparkh Adoopmapreduceutilwith serializable{}12345678910
It looks like you're looking to design your own Monoid/semigroup library. Don't worry, Twitter has done it and made it available for spark!. (This means that everything is serializable.) They also write to make it effective rate (this sentence does not understand, original: They has also written it in a way such that it performs very)
Https://github.com/twitter/algebird
Together, monoids and Spark
We need to write a lot of functions when we gather rdds in spark, but unfortunately, these functions are roughly the same, but are difficult to write in a general way. Using Monoids is a way to achieve the goal, which is a practical example:
//this is a call from an aggregation section that updates state with the HyperLogLog object val stateUniques = Makemodeluniquestime.updatestatebykey (Updatetotalcountstate[hll]) //This is a call From an aggregation section that updates state with the long val statepv = makemodelcountreducewithtime.updatestatebykey (UpdateTotalCountState[Long])/ /this was originally implemented as tow methods, one for hll and one for long. with monoids we can write a singel method that takes care of both cases.def updatetotalcountstate[u] ( values: seq[(bananatimestamp, u)], state: option[(bananatimestamp, u)] (implicit monoid: monoid[u], ct: cLasstag[u]): option[(bananatimestamp, u)] = { val defaultstate = (Null, monoid.zero) values match { case nil = > some (State.getorelse (defaultstate)) case _ => val hdt = values (0) ._1 // the Reduction logic is now contained in the monoid definitions as opposed to thest functions. we can instead distil this to what is takes to update state val v = values.map{case (_, a) => a}.reduce (monoid.plus) val statereceived = state.getorelse (defaultstate) if (Checkresetstate (statereceIVED._1, HDT) some ((hdt, v)) else some (Hdt, monoid.plus (v, stateReceived. _2)) } }123456789101112131415161718
Original link: https://thewanderingmonad.wordpress.com/2015/05/17/introduction-to-monoids-and-semigroups-with-spark/
Reference
1, monoid Http://hongjiang.info/semigroup-and-monoid/
2, Https://zh.wikipedia.org/wiki/%E5%B9%BA%E5%8D%8A%E7%BE%A4
3, http://www.ituring.com.cn/article/195776
Introduction to Monoids and semigroups with Spark