In Spark, the implementation group takes top N (Scala version) __spark

Source: Internet
Author: User

1, the source data as follows, take out the top three of each class results

Class1 98
Class2 90
CLASS2 92
Class1 96
Class1 100
Class2 89
Class2 68
Class1 81
Class2 90

2, the implementation process

Package Basic

Import Org.apache.spark. {sparkconf, Sparkcontext}


/**
* Created by TG on 10/25/16.
*/
Object GROUPTOPN {
def main (args:array[string]): unit = {
Val conf=new sparkconf (). Setappname ("Grouptopn"). Setmaster ("local")
Val sc=new sparkcontext (conf)


Reading data from the HDFs
Val lines=sc.textfile ("Hdfs://tgmaster:9000/in/classinfo")


/**
* 1, map operator to form mapping (Class,score)
* 2, through the Groupbykey operator for Class key group
* 3, through the map operator to the packet after the first 3, the core code: Val top3=m._2.toarray.sortwith (_>_). Take (3)
* 4, through the foreach operator to traverse the output
*/
Lines.map (m=>{
Val info=m.split ("")
(info (0), info (1). ToInt)
}). Groupbykey (). Map (m=>{
Val Classname=m._1
Val Top3=m._2.toarray.sortwith (_>_). Take (3)
(CLASSNAME,TOP3)
). foreach (item=>{
Val Classname=item._1
println ("class:" +classname+) the top 3 are: ")
Item._2.foreach (m=>{
println (M)
})
})
}
}

3. Operation Result:

Class: The top 3 Class1 are:
100
98
96
Class: The top 3 Class2 are:
92
90
90

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.