43rd lesson: Spark 1.6 RPC Insider decryption: Operating mechanism, source details, Netty and Akka, etc.

Source: Internet
Author: User

Spark is a distributed computing framework, and there must be communication between multiple machines. Spark was implemented in earlier versions using Akka. A rpcenv is now abstracted from the upper Akka. The rpcenv is responsible for managing communication between machines.

The RPCENV consists of three main cores:

    • Rpcendpoint the message loop body, which is responsible for receiving and processing messages. Both the master and the worker in Spark are rpcendpoint.

    • Rpcendpointref:rpcendpoint reference, if necessary and rpcendpoint communication, you must obtain its rpcendpointref, send a message through RPCENDPOINTREF.

    • Dispatcher: The message scheduler that is responsible for routing RPC messages to the appropriate rpcendpoint.


After Rpcenv is created, Rpcendpoint can be registered to rpcenv, and the registered Rpcendpoint will generate a corresponding rpcendpointref to reference it. If you need to send a message to Rpcendpoint, you must obtain the corresponding rpcendpointref through the rpcendpoint name in Rpcenv, and then send the message to Rpcendpointref via Rpcendpoint.


Rpcenv is responsible for managing the entire life cycle of rpcendpoint

    • Register Rpcendpoint, use name or URI

    • Routes messages sent to Rpcendpoint.

    • Stop Rpcendpoint


Note: A rpcendpoint can only be registered to a rpcenv


rpcaddress : The logical address of the RPCENV, using the host name and port representation.

rpcendpointaddress : The address of the Rpcendpoint registered on the rpcenv, consisting of rpcaddress and name.


This shows that rpcenv and Rpcendpoint are on the same machine (the same JVM). To send a message to the remote machine is to get the rpcendpointref of the remote machine, not the remote Rpcendpoint to register in the local rpcenv.



In the Spark1.6 version, Netty is used by default

Private def getrpcenvfactory (conf:sparkconf): Rpcenvfactory = {val Rpcenvnames = Map ("Akka", Org.apache.spark . Rpc.akka.AkkaRpcEnvFactory "," Netty "," Org.apache.spark.rpc.netty.NettyRpcEnvFactory ") val rpcenvname = conf.ge  T ("Spark.rpc", "Netty") val rpcenvfactoryclassname = Rpcenvnames.getorelse (rpcenvname.tolowercase, Rpcenvname) Utils.classforname (Rpcenvfactoryclassname). newinstance (). Asinstanceof[rpcenvfactory]}


Rpcendpoint is a message loop body with its life cycle:

Construction (Constructor), Start (OnStart), message receive (receive&receiveandreply), Stop (onStop)

Receive (): constantly running, processing messages sent by the client.

Receiveandreply (): Processes messages and responds to each other.


Let's take a look at the Master code:

Def main (argstrings: array[string])  {  signallogger.register (log)   val  conf = new sparkconf  val args = new masterarguments ( argstrings, conf)   //the hostname specified must be the local machine name of the start-master.sh script run   val  (rpcenv, _,  _)  = startrpcenvandendpoint (args.host, args.port, args.webuiport, conf)    rpcenv.awaittermination ()}/** * start the master and return a  three tuple of: *    (1)  The Master RpcEnv *     (2)  The web UI bound port *    (3)  The REST  Server bound port, if any */def startrpcenvandendpoint (     host: string,    port: int,    webuiport: int,     conf: sparkconf):  (Rpcenv, int, option[int])  = {  val securitymgr = new  securitymanager (conf)   //creates the RPC environment, and the hostname and port are the access addresses of the standalone cluster. System_name=sparkmaster  val rpcenv = rpcenv.create (SYSTEM_NAME, host, port ,  conf, securitymgr)   //  Register the master instance in Rpcenv   val masterEndpoint  = rpcenv.setupendpoint (Endpoint_name,    new master (rpcEnv, rpcEnv.address,  webuiport, securitymgr, conf))   val portsResponse =  Masterendpoint.askwithretry[boundportsresponse] (boundportsrequest)    (rpcenv,  Portsresponse.webuiport, portsresponse.restport)}

The rpcenv is created in the main method, and the master instance is instantiated and then registered to rpcenv.

Rpcendpoint is actually registered in the dispatcher, the code in Netty is implemented as follows:

Override Def setupendpoint (Name:string, endpoint:rpcendpoint): Rpcendpointref = {Dispatcher.registerrpcendpoint (name , endpoint)}

Note: line 135th of Nettyrpcenv.scala


In dispatcher, the following data structures are used to store rpcendpoint and Rpcendpointref

Private Val endpoints = new concurrenthashmap[string, endpointdata]private val endpointrefs = new Concurrenthashmap[rpcen Dpoint, Rpcendpointref]


Endpointdata is a case class:

Private Class Endpointdata (Val name:string, Val Endpoint:rpcendpoint, Val ref:nettyrpcendpointref) {val inb Ox = new Inbox (ref, endpoint)}


Using the data structure in master Workerinfo holds the information for each worker, including the rpcendpointref of each worker


650) this.width=650; "src=" Http://s2.51cto.com/wyfs02/M00/7F/BE/wKioL1crNcnD43aHAAGXNkk3OWo106.png "title=" Sogou 20160505195719.png "alt=" Wkiol1crncnd43ahaagxnkk3owo106.png "/>


Note:

1. DT Big Data Dream Factory public number Dt_spark
2, the IMF 8 o'clock in the evening big data real combat YY Live channel number: 68917580
3, Sina Weibo: Http://www.weibo.com/ilovepains


This article is from the "Ding Dong" blog, please be sure to keep this source http://lqding.blog.51cto.com/9123978/1770549

43rd lesson: Spark 1.6 RPC Insider decryption: Operating mechanism, source details, Netty and Akka, etc.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.