1, the source data as follows, take out the top three of each class results
Class1 98
Class2 90
CLASS2 92
Class1 96
Class1 100
Class2 89
Class2 68
Class1 81
Class2 90
2, the implementation process
Package Basic
Import Org.apache.spark. {sparkconf, Sparkcontext}
/**
* Created by TG on 10/25/16.
*/
Object GROUPTOPN {
def main (args:array[string]): unit = {
Val conf=new sparkconf (). Setappname ("Grouptopn"). Setmaster ("local")
Val sc=new sparkcontext (conf)
Reading data from the HDFs
Val lines=sc.textfile ("Hdfs://tgmaster:9000/in/classinfo")
/**
* 1, map operator to form mapping (Class,score)
* 2, through the Groupbykey operator for Class key group
* 3, through the map operator to the packet after the first 3, the core code: Val top3=m._2.toarray.sortwith (_>_). Take (3)
* 4, through the foreach operator to traverse the output
*/
Lines.map (m=>{
Val info=m.split ("")
(info (0), info (1). ToInt)
}). Groupbykey (). Map (m=>{
Val Classname=m._1
Val Top3=m._2.toarray.sortwith (_>_). Take (3)
(CLASSNAME,TOP3)
). foreach (item=>{
Val Classname=item._1
println ("class:" +classname+) the top 3 are: ")
Item._2.foreach (m=>{
println (M)
})
})
}
}
3. Operation Result:
Class: The top 3 Class1 are:
100
98
96
Class: The top 3 Class2 are:
92
90
90