43rd lesson: Spark 1.6 RPC Insider decryption: Operating mechanism, source details, Netty and Akka, etc.

Last Update:2016-05-06 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Spark is a distributed computing framework, and there must be communication between multiple machines. Spark was implemented in earlier versions using Akka. A rpcenv is now abstracted from the upper Akka. The rpcenv is responsible for managing communication between machines.

The RPCENV consists of three main cores:

Rpcendpoint the message loop body, which is responsible for receiving and processing messages. Both the master and the worker in Spark are rpcendpoint.
Rpcendpointref:rpcendpoint reference, if necessary and rpcendpoint communication, you must obtain its rpcendpointref, send a message through RPCENDPOINTREF.
Dispatcher: The message scheduler that is responsible for routing RPC messages to the appropriate rpcendpoint.

After Rpcenv is created, Rpcendpoint can be registered to rpcenv, and the registered Rpcendpoint will generate a corresponding rpcendpointref to reference it. If you need to send a message to Rpcendpoint, you must obtain the corresponding rpcendpointref through the rpcendpoint name in Rpcenv, and then send the message to Rpcendpointref via Rpcendpoint.

Rpcenv is responsible for managing the entire life cycle of rpcendpoint

Register Rpcendpoint, use name or URI
Routes messages sent to Rpcendpoint.
Stop Rpcendpoint

Note: A rpcendpoint can only be registered to a rpcenv

rpcaddress : The logical address of the RPCENV, using the host name and port representation.

rpcendpointaddress : The address of the Rpcendpoint registered on the rpcenv, consisting of rpcaddress and name.

This shows that rpcenv and Rpcendpoint are on the same machine (the same JVM). To send a message to the remote machine is to get the rpcendpointref of the remote machine, not the remote Rpcendpoint to register in the local rpcenv.

In the Spark1.6 version, Netty is used by default

Private def getrpcenvfactory (conf:sparkconf): Rpcenvfactory = {val Rpcenvnames = Map ("Akka", Org.apache.spark . Rpc.akka.AkkaRpcEnvFactory "," Netty "," Org.apache.spark.rpc.netty.NettyRpcEnvFactory ") val rpcenvname = conf.ge  T ("Spark.rpc", "Netty") val rpcenvfactoryclassname = Rpcenvnames.getorelse (rpcenvname.tolowercase, Rpcenvname) Utils.classforname (Rpcenvfactoryclassname). newinstance (). Asinstanceof[rpcenvfactory]}

Rpcendpoint is a message loop body with its life cycle:

Construction (Constructor), Start (OnStart), message receive (receive&receiveandreply), Stop (onStop)

Receive (): constantly running, processing messages sent by the client.

Receiveandreply (): Processes messages and responds to each other.

Let's take a look at the Master code:

Def main (argstrings: array[string])  {  signallogger.register (log)   val  conf = new sparkconf  val args = new masterarguments ( argstrings, conf)   //the hostname specified must be the local machine name of the start-master.sh script run   val  (rpcenv, _,  _)  = startrpcenvandendpoint (args.host, args.port, args.webuiport, conf)    rpcenv.awaittermination ()}/** * start the master and return a  three tuple of: *    (1)  The Master RpcEnv *     (2)  The web UI bound port *    (3)  The REST  Server bound port, if any */def startrpcenvandendpoint (     host: string,    port: int,    webuiport: int,     conf: sparkconf):  (Rpcenv, int, option[int])  = {  val securitymgr = new  securitymanager (conf)   //creates the RPC environment, and the hostname and port are the access addresses of the standalone cluster. System_name=sparkmaster  val rpcenv = rpcenv.create (SYSTEM_NAME, host, port ,  conf, securitymgr)   //  Register the master instance in Rpcenv   val masterEndpoint  = rpcenv.setupendpoint (Endpoint_name,    new master (rpcEnv, rpcEnv.address,  webuiport, securitymgr, conf))   val portsResponse =  Masterendpoint.askwithretry[boundportsresponse] (boundportsrequest)    (rpcenv,  Portsresponse.webuiport, portsresponse.restport)}

The rpcenv is created in the main method, and the master instance is instantiated and then registered to rpcenv.

Rpcendpoint is actually registered in the dispatcher, the code in Netty is implemented as follows:

Override Def setupendpoint (Name:string, endpoint:rpcendpoint): Rpcendpointref = {Dispatcher.registerrpcendpoint (name , endpoint)}

Note: line 135th of Nettyrpcenv.scala

In dispatcher, the following data structures are used to store rpcendpoint and Rpcendpointref

Private Val endpoints = new concurrenthashmap[string, endpointdata]private val endpointrefs = new Concurrenthashmap[rpcen Dpoint, Rpcendpointref]

Endpointdata is a case class:

Private Class Endpointdata (Val name:string, Val Endpoint:rpcendpoint, Val ref:nettyrpcendpointref) {val inb Ox = new Inbox (ref, endpoint)}

Using the data structure in master Workerinfo holds the information for each worker, including the rpcendpointref of each worker

650) this.width=650; "src=" Http://s2.51cto.com/wyfs02/M00/7F/BE/wKioL1crNcnD43aHAAGXNkk3OWo106.png "title=" Sogou 20160505195719.png "alt=" Wkiol1crncnd43ahaagxnkk3owo106.png "/>

Note:

1. DT Big Data Dream Factory public number Dt_spark
2, the IMF 8 o'clock in the evening big data real combat YY Live channel number: 68917580
3, Sina Weibo: Http://www.weibo.com/ilovepains

This article is from the "Ding Dong" blog, please be sure to keep this source http://lqding.blog.51cto.com/9123978/1770549

43rd lesson: Spark 1.6 RPC Insider decryption: Operating mechanism, source details, Netty and Akka, etc.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

43rd lesson: Spark 1.6 RPC Insider decryption: Operating mechanism, source details, Netty and Akka, etc.

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

43rd lesson: Spark 1.6 RPC Insider decryption: Operating mechanism, source details, Netty and Akka, etc.

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support