Sometimes your own business needs to implement the partition function of Spark yourself
The following code is a demo that implements a custom spark partition
The function that is implemented is to write to a different file according to the last digit of the key value
For example:
10 Write to part-00000
11 Write to part-00001
.
.
.
write to part-00009
The idea of providing a custom partition for the reader
Import Org.apache.spark. {partitioner, sparkcontext, sparkconf}//Custom partition class, inheriting Partitioner class-classes Usridpartitioner (Numparts:int) extends partitioner{//Overwrite partition number override def Numpartitions:int = numparts//overwrite partition number get function override Def getpartition (Key:any): Int = {Key.tostring.toint%10}}object Test {def main (args:array[string]) {val conf=new sparkconf () Val sc=new Sparkcontext (CONF)//Simulate 5 partitions of data Val data=sc.parallelize (1 to 10,5)///tail number into 10 partitions, divided into 10 file Data.map ((_,1)). Partitionby (New Usridpartitioner). Saveastextfile ("/chenm/partition")}}
Spark Custom partitioning and sample code