Pyspark corresponding Scala code Pythonrdd object

Source: Internet
Author: User
Tags pyspark

Pyspark the JVM-side Scala code Pythonrdd

Code version for Spark 2.2.0

1.pythonrdd.object

This static class is a base entry for Pyspark

This does not introduce the entire content of this class, because most of them are static interfaces, called by the Pyspark Code///Here are some of the main functions// The Collectandserver method called by the Collect method that is the base of all actions in the Pyspark RDD is also defined in this object Private[spark] object Pythonrdd extends Logging {//Be pyspark. Sparkcontext.runjob Call//provide rdd.collect function, submit job Def runjob (Sc:sparkcontext, Rdd:javardd[array[byte]], p Artitions:jarraylist[int]): Int = {Type ByteArray = Array[byte] Type unrolledpartition = Array[bytearray] val A     Llpartitions:array[unrolledpartition] = Sc.runjob (Rdd, (x:iterator[bytearray)) = X.toarray, Partitions.asscala) Val flattenedpartition:unrolledpartition = Array.concat (allpartitions: _*) serveiterator (FlattenedPartition.iterat Or, S "Serve RDD ${rdd.id} with Partitions ${partitions.asscala.mkstring (", ")}")}//Entire pyspark. The RDD action is triggered in this function//Pyspark. The Collect of the RDD is triggered by calling this method to execute the RDD and task commit def Collectandserve[t] (rdd:rdd[t]): Int = {//Parameter RDD is pyspark in the Rdd _jrdd, corresponding to Scala The data source Rdd or Pythonrdd//here Rdd.collect () triggers the task to start running ServeiteRator (Rdd.collect (). Iterator, S "Serve Rdd ${rdd.id}")}//The function is to write the result of the calculation to the local socket, and then read the local socket in the Pyspark to obtain the result def ser  Veiterator[t] (items:iterator[t], threadname:string): Int = {///You can see that the socket is set up on the local random port and localhost on the Val serversocket = new ServerSocket (0, 1, inetaddress.getbyname ("localhost"))//Close The socket if no connection in 3 seconds serv Ersocket.setsotimeout (3000)//Here start a thread responsible for writing the results to the socket new thread (threadname) {Setdaemon (true) override D EF run () {try {val sock = Serversocket.accept () val out = new DataOutputStream (New BUFFEREDOUTP Utstream (Sock.getoutputstream)) utils.trywithsafefinally {//is specifically responsible for writing this function, this function mainly does some types and serializes the work of WR Iteiteratortostream (items, out)} {Out.close ()}} catch {case nonfatal (e)      = = LogError (S "Error while sending iterator", E)} finally {Serversocket.close ()} }}.start ()//Finally, return the network port of this socket so that Pyspark can read the data through this port Serversocket.getlocalport}//This function is responsible for writing data results//doing some type checking and corresponding serialization work//Pythonru This function is also used when Writerthread writes data in Nner def Writeiteratortostream[t] (iter:iterator[t], dataout:dataoutputstream) {def write (  Obj:any): Unit = obj match {case null = Dataout.writeint (speciallengths.null) case Arr:array[byte] = Dataout.writeint (arr.length) dataout.write (arr) Case str:string = writeUTF (str, data Out) Case Stream:portabledatastream = Write (Stream.toarray ()) Case (key, value) = Write (k     EY) Write (value) Case other = throw new Sparkexception ("unexpected element type" + Other.getclass) } iter.foreach (write)}}

Pyspark corresponding Scala code Pythonrdd object

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.