one, custom partition
1. Overview
The default is the hash of the partitioning strategy, which is similar to Hadoop, the specific partition description, see: 68491115
2. Implement
PackageCn.itcast.spark.day3ImportJava.net.URLImportOrg.apache.spark. {Hashpartitioner, partitioner, sparkconf, sparkcontext}Importscala.collection.mutable/*** Created by Root on 2016/5/18. */Object Urlcountpartition {def main (args:array[string]) {val conf=NewSparkconf (). Setappname ("Urlcountpartition"). Setmaster ("local[2]") Val SC=Newsparkcontext (conf)//RDD1 data is sliced, the tuple is placed (URL, 1)Val rdd1 = Sc.textfile ("C://itcast.log"). Map (line ={val F= Line.split ("\ t") (F (1), 1)}) Val Rdd2= Rdd1.reducebykey (_ +_) Val rdd3= Rdd2.map (T = ={val URL=t._1 Val Host=Newurl (url). GetHost (host, (URL, t._2))}) Val ints=Rdd3.map (_._1). Distinct (). Collect () Val Hostparitioner=NewHostparitioner (ints)//val rdd4 = Rdd3.partitionby (new Hashpartitioner (ints.length))Val rdd4= Rdd3.partitionby (Hostparitioner). Mappartitions (It{it.toList.sortBy (_._2._2). Reverse.take (2). Iterator}) Rdd4.saveastextfile ("C://out4") //println (Rdd4.collect (). Tobuffer)sc.stop ()}}/*** Determines which partition the data is in *@paramins*/classHostparitioner (ins:array[string])extendsPartitioner {val Parmap=Newmutable. Hashmap[string, Int] () var count= 0 for(I <-ins) {Parmap+ = (I-count) Count+ = 1} override def Numpartitions:int=ins.length override def getpartition (key:any): Int={parmap.getorelse (key.tostring,0) }}
//Connect with Hadoop and don't repeat
Second, custom sorting
This is basically a combination of the previous implicit conversions: (Here you can use the sample class to get an instance without new and also for pattern matching)
PackageCn.itcast.spark.day3ImportOrg.apache.spark. {sparkconf, sparkcontext}object ordercontext {implicit val girlordering=NewOrdering[girl] {override def compare (X:girl, y:girl): Int= { if(X.facevalue > Y.facevalue) 1Else if(X.facevalue = =y.facevalue) {if(X.age > Y.age)-1Else1 } Else-1 } }}/*** Created by Root on 2016/5/18. *///sort = Rule First press Favevalue, compare age//Name,favevalue,ageObject Customsort {def main (args:array[string]) {val conf=NewSparkconf (). Setappname ("Customsort"). Setmaster ("local[2]") Val SC=Newsparkcontext (conf) Val rdd1= Sc.parallelize (List ("Yuihatano", 1, 95, 22, 3, ("Angelababy", 2), ("Jujingyi",))) Importordercontext._ Val rdd2= Rdd1.sortby (x = Girl (x._2, X._3),false) println (Rdd2.collect (). Tobuffer) Sc.stop ()}}/*** First Way *@paramFacevalue *@paramAgecase class Girl (Val facevalue:int, Val age:int) extends Ordered[girl] with Serializable {override Def compare ( That:girl): Int = {if (This.facevalue = = That.facevalue) {That.age-this.age} else {this.facevalue-t Hat.facevalue} }}*//*** Second, by implicit conversion complete sorting *@paramFacevalue *@param Age*/ Case classGirl (Facevalue:int, Age:int)extendsSerializable
Getting started with Big Data day 22nd--spark (iii) custom partitioning, sorting, and finding