Welcome reprint, Reprint please indicate the source, emblem Shanghai one lang.
Profile
The WEB UI and metrics subsystem provide the necessary windows for external observation to monitor the internal operation of Spark, and this article will briefly take a look at its internal code implementation.
WEB UI
First feel the spark WebUI assuming that you are currently running standalone cluster mode in your native computer, enter http://127.0.0.1:8080 will see the following page
Driver application opens Port 4040 for HTTP snooping by default, and you can see application-related details
Show detailed information for each stage
startup process
The focus of this sectionto discussion is how HTTP server is started, and where is the data from the page obtained? The HTTP server used in Spark is jetty, Jetty is written in Java and is a very lightweight servlet engine and HTTP server. Be able to embed in the user program and do not need a separate JVM process like Tomcat or JBoss.
Sparkui is created when Sparkcontext is initialized
// Initialize the Spark UI , registering allassociated listenersprivate [spark] val ui = new SparkUI (this)ui.bind ()
The main job of initialize is to register the handle of the page handle, and WebUI subclasses need to implement their own initialize function
Bind will actually start jetty server.
def bind () {assert (! serverInfo .isDefined , " Attempted to bind %s more than once!". format ( className ))try {// 启 动 JettyServerserverInfo = Some( startJettyServer (" 0.0.0.0 ",port , handlers , conf))logInfo (" Started %s at http ://%s:%d". format (className , publicHostName , boundPort ))} catch {case e: Exception =>logError (" Failed to bind %s". format ( className ), e)System .exit (1)}}
The key handler function to run Jettyserver in the Startjettyserver function is connect
def connect(currentPort: Int): (Server, Int) = { val server = new Server(new InetSocketAddress(hostName, currentPort)) val pool = new QueuedThreadPool pool.setDaemon(true) server.setThreadPool(pool) server.setHandler(collection) Try { server.start() } match { case s: Success[_] => (server, server.getConnectors.head.getLocalPort) case f: Failure[_] => val nextPort = (currentPort + 1) % 65536 server.stop() pool.stop() val msg = s"Failed to create UI on port $currentPort. Trying again on port $nextPort." if (f.toString.contains("Address already in use")) { logWarning(s"$msg - $f") } else { logError(msg, f.exception) } connect(nextPort) } } val (server, boundPort) = connect(port) ServerInfo(server, boundPort, collection) }
Data acquisition
How the data in the page is obtained, thanks to Sparklistener, a typical observer design pattern. When there are events related to stage and task, these listener are notified and the data is updated.
It should be noted that although the data is automatically updated, the page does not, or needs to be manually refreshed to get the latest data.
Shows which Sparklistener subclasses are registered in the Sparkui. Have a look at when these subclasses are registered, pay attention to study sparkui.initialize letter
def initialize() { listenerBus.addListener(storageStatusListener) val jobProgressTab = new JobProgressTab(this) attachTab(jobProgressTab) attachTab(new StorageTab(this)) attachTab(new EnvironmentTab(this)) attachTab(new ExecutorsTab(this)) attachHandler(createStaticHandler(SparkUI.STATIC_RESOURCE_DIR, "/static")) attachHandler(createRedirectHandler("/", "/stages", basePath = basePath)) attachHandler( createRedirectHandler("/stages/stage/kill", "/stages", jobProgressTab.handleKillRequest)) if (live) { sc.env.metricsSystem.getServletHandlers.foreach(attachHandler) } }
As a practical example, consider the moment when Notifier sends an event, such as when a task is committed resourceoffer->taskstarted->handlebeginevent
private [ scheduler ] def handleBeginEvent (task: Task[_], taskInfo : TaskInfo ) {listenerBus .post( SparkListenerTaskStart (task.stageId , taskInfo ))submitWaitingStages ()}
Post is actually adding a message to the Listenerbus message queue and actually sending the message out to another processing thread listenerthread
override def run (): Unit = Utils.logUncaughtExceptions {while (true) {eventLock . acquire ()// Atomically remove and process this eventLiveListenerBus .this. synchronized {val event = eventQueue .pollif (event == SparkListenerShutdown ) {// Get out of the while loop and shutdownthe daemon threadreturn}Option (event). foreach ( postToAll )}}}
Option (Event). foreach (Posttoall) is responsible for notifying the individual observer.posttoall of the function implementation as follows
def posttoall (event:sparklistenerevent) {event Match {case stagesubmitted:sparklistenerstagesubmitted = Foreachlistener (_.onstagesubmitted (stagesubmitted)) Case stagecompleted:sparklistenerstagecompleted = Foreachlistener (_.onstagecompleted (stagecompleted)) Case Jobstart:sparklistenerjobstart = ForeachListen ER (_.onjobstart (jobstart)) Case jobend:sparklistenerjobend = Foreachlistener (_.onjobend (jobend)) CAs E Taskstart:sparklistenertaskstart = Foreachlistener (_.ontaskstart (taskstart)) Case TASKGETTINGRESULT:SP Arklistenertaskgettingresult = Foreachlistener (_.ontaskgettingresult (taskgettingresult)) Case Taskend:spa Rklistenertaskend = Foreachlistener (_.ontaskend (taskend)) Case Environmentupdate:sparklistenerenvironment Update = Foreachlistener (_.onenvironmentupdate (environmentupdate)) Case BLOCKMANAGERADDED:SPARKLISTENERBL ockmanageradded = Foreachlistener (_.onblockmanageradded (blockmanageradded)) Case Blockmanagerremoved:sparklistenerblockmanagerrem Oved = Foreachlistener (_.onblockmanagerremoved (blockmanagerremoved)) Case Unpersistrdd:sparklistenerunper Sistrdd = Foreachlistener (_.onunpersistrdd (UNPERSISTRDD)) Case Applicationstart:sparklistenerapplications Tart = Foreachlistener (_.onapplicationstart (applicationstart)) Case Applicationend:sparklistenerapplicati OnEnd = Foreachlistener (_.onapplicationend (applicationend)) Case Sparklistenershutdown =}}
Metrics
In the system design, the measurement module is an indispensable part. These measurement data are adopted to perceive the operation of the system.
In Spark, the measurement module is held by Metricssystem, and there are three important concepts in the Metricssystem, which are described below.
- Instance indicates who is using metrics system, and currently known to have master, worker, executor, and client driver create metrics system to measure
- Source represents the data source, where to get the data
- Sinks data destination, where the data obtained from source is sent to
Spark currently supports saving or sending measurement data to the following destinations
- Consolesink Output to Console
- Csvsink regularly saved as a CSV file
- Jmxsink registered to JMX to view it through Jmxconsole
- Metricsservlet adding Metricsservlet in Sparkui to view the measurement data at the Task runtime
- Graphitesink sent to graphite to monitor the entire system (not just spark)
Following from the creation of Metricssystem, the addition of data sources, data update and send several aspects to track the source code.
Initialization process
Metricssystem relies on the third-party library metrics provided by Codahale, which can be found in more detail in metrics.codahale.com.
Take driver application As an example, driver application first initializes sparkcontext, Sparkcontext is created during the initialization of Metricssystem, and the specific invocation relationship is as follows. Sparkcontext.init->sparkenv.init->metricssystem.createmetricssystem
Registering the data source, continue with Sparkcontext as an example
private val dagSchedulerSource = new DAGSchedulerSource(this.dagScheduler, this) private val blockManagerSource = new BlockManagerSource(SparkEnv.get.blockManager, this) private def initDriverMetrics() { SparkEnv.get.metricsSystem.registerSource(dagSchedulerSource) SparkEnv.get.metricsSystem.registerSource(blockManagerSource) }initDriverMetrics()
Data read
Data reads are done by sink, as shown in the sink subclass created in spark
Read the latest data, take Csvsink as an example, the most important thing is to create a csvreporter, after launch will periodically update the latest data to the console. The reporter used by different types of sink are not the same.
val reporter: CsvReporter = CsvReporter.forRegistry(registry) .formatFor(Locale.US) .convertDurationsTo(TimeUnit.MILLISECONDS) .convertRatesTo(TimeUnit.SECONDS) .build(new File(pollDir)) override def start() { reporter.start(pollPeriod, pollUnit) }
The configuration file for the metrics subsystem in Spark is described in conf/metrics.properties. The default sink is Metricsservlet, and after the task commits execution, the input Http://127.0.0.1:4040/metrics/json gets the metrics information saved in JSON format.