Spark is a distributed computing framework, and there must be communication between multiple machines. Spark was implemented in earlier versions using Akka. A rpcenv is now abstracted from the upper Akka. The rpcenv is responsible for managing communication between machines.
The RPCENV consists of three main cores:
Rpcendpoint the message loop body, which is responsible for receiving and processing messages. Both the master and the worker in Spark are rpcendpoint.
Rpcendpointref:rpcendpoint reference, if necessary and rpcendpoint communication, you must obtain its rpcendpointref, send a message through RPCENDPOINTREF.
Dispatcher: The message scheduler that is responsible for routing RPC messages to the appropriate rpcendpoint.
After Rpcenv is created, Rpcendpoint can be registered to rpcenv, and the registered Rpcendpoint will generate a corresponding rpcendpointref to reference it. If you need to send a message to Rpcendpoint, you must obtain the corresponding rpcendpointref through the rpcendpoint name in Rpcenv, and then send the message to Rpcendpointref via Rpcendpoint.
Rpcenv is responsible for managing the entire life cycle of rpcendpoint
Note: A rpcendpoint can only be registered to a rpcenv
rpcaddress : The logical address of the RPCENV, using the host name and port representation.
rpcendpointaddress : The address of the Rpcendpoint registered on the rpcenv, consisting of rpcaddress and name.
This shows that rpcenv and Rpcendpoint are on the same machine (the same JVM). To send a message to the remote machine is to get the rpcendpointref of the remote machine, not the remote Rpcendpoint to register in the local rpcenv.
In the Spark1.6 version, Netty is used by default
Private def getrpcenvfactory (conf:sparkconf): Rpcenvfactory = {val Rpcenvnames = Map ("Akka", Org.apache.spark . Rpc.akka.AkkaRpcEnvFactory "," Netty "," Org.apache.spark.rpc.netty.NettyRpcEnvFactory ") val rpcenvname = conf.ge T ("Spark.rpc", "Netty") val rpcenvfactoryclassname = Rpcenvnames.getorelse (rpcenvname.tolowercase, Rpcenvname) Utils.classforname (Rpcenvfactoryclassname). newinstance (). Asinstanceof[rpcenvfactory]}
Rpcendpoint is a message loop body with its life cycle:
Construction (Constructor), Start (OnStart), message receive (receive&receiveandreply), Stop (onStop)
Receive (): constantly running, processing messages sent by the client.
Receiveandreply (): Processes messages and responds to each other.
Let's take a look at the Master code:
Def main (argstrings: array[string]) { signallogger.register (log) val conf = new sparkconf val args = new masterarguments ( argstrings, conf) //the hostname specified must be the local machine name of the start-master.sh script run val (rpcenv, _, _) = startrpcenvandendpoint (args.host, args.port, args.webuiport, conf) rpcenv.awaittermination ()}/** * start the master and return a three tuple of: * (1) The Master RpcEnv * (2) The web UI bound port * (3) The REST Server bound port, if any */def startrpcenvandendpoint ( host: string, port: int, webuiport: int, conf: sparkconf): (Rpcenv, int, option[int]) = { val securitymgr = new securitymanager (conf) //creates the RPC environment, and the hostname and port are the access addresses of the standalone cluster. System_name=sparkmaster val rpcenv = rpcenv.create (SYSTEM_NAME, host, port , conf, securitymgr) // Register the master instance in Rpcenv val masterEndpoint = rpcenv.setupendpoint (Endpoint_name, new master (rpcEnv, rpcEnv.address, webuiport, securitymgr, conf)) val portsResponse = Masterendpoint.askwithretry[boundportsresponse] (boundportsrequest) (rpcenv, Portsresponse.webuiport, portsresponse.restport)}
The rpcenv is created in the main method, and the master instance is instantiated and then registered to rpcenv.
Rpcendpoint is actually registered in the dispatcher, the code in Netty is implemented as follows:
Override Def setupendpoint (Name:string, endpoint:rpcendpoint): Rpcendpointref = {Dispatcher.registerrpcendpoint (name , endpoint)}
Note: line 135th of Nettyrpcenv.scala
In dispatcher, the following data structures are used to store rpcendpoint and Rpcendpointref
Private Val endpoints = new concurrenthashmap[string, endpointdata]private val endpointrefs = new Concurrenthashmap[rpcen Dpoint, Rpcendpointref]
Endpointdata is a case class:
Private Class Endpointdata (Val name:string, Val Endpoint:rpcendpoint, Val ref:nettyrpcendpointref) {val inb Ox = new Inbox (ref, endpoint)}
Using the data structure in master Workerinfo holds the information for each worker, including the rpcendpointref of each worker
650) this.width=650; "src=" Http://s2.51cto.com/wyfs02/M00/7F/BE/wKioL1crNcnD43aHAAGXNkk3OWo106.png "title=" Sogou 20160505195719.png "alt=" Wkiol1crncnd43ahaagxnkk3owo106.png "/>
Note:
1. DT Big Data Dream Factory public number Dt_spark
2, the IMF 8 o'clock in the evening big data real combat YY Live channel number: 68917580
3, Sina Weibo: Http://www.weibo.com/ilovepains
This article is from the "Ding Dong" blog, please be sure to keep this source http://lqding.blog.51cto.com/9123978/1770549
43rd lesson: Spark 1.6 RPC Insider decryption: Operating mechanism, source details, Netty and Akka, etc.