Two ways to submit spark on yarn

Source: Internet
Author: User


As with Yarn-cluster mode, the entire program is submitted through the Spark-submit script. However, the operation of the yarn-client operation does not need to be encapsulated by the client class to boot, but instead calls the job's main function directly through the reflection mechanism. Here's how to analyze:
1, through the Sparksubmit class of launch function directly invoke the job's main function (through the reflection mechanism implementation), if the cluster mode will call the client's main function.
2, and the main function of the application must have a sparkcontent, and initialize it;
3, in Sparkcontent initialization will do the following things: Set the relevant configuration, registration Mapoutputtracker, Blockmanagermaster, Blockmanager, Create TaskScheduler and Dagscheduler; it is more important to create TaskScheduler and Dagscheduler. When creating TaskScheduler, we choose Scheduler and Schedulerbackend according to the master we pass in. Since we chose the yarn-client mode, the program will chooseYarnClientclusterscheduler andYarnClientschedulerbackend, and willYarnClientschedulerbackend instance initialization Yarnclientclusterscheduler, the above two instances are obtained through the reflection mechanism, The Yarnclientschedulerbackend class is a subclass of the Coarsegrainedschedulerbackend class, and Yarnclientclusterscheduler is a subclass of Taskschedulerimpl. Just rewrite the Getrackforhost method in Taskschedulerimpl.
4. After initializing TaskScheduler, the Dagscheduler will be created, and then the TaskScheduler is started through Taskscheduler.start (). The Start method of Schedulerbackend is also called during TaskScheduler startup. During Schedulerbackend startup, parameters are initialized, encapsulated in clientarguments, and encapsulated clientarguments into the client class and Client.runapp () method to get the application ID.
5, Client.runapp inside to do is and the frontClient to operateSimilar to that section, the difference is that it starts in the Executorlauncher (yarn-cluster mode is Applicationmaster).
6, in the Executorlauncher will initialize and start the amclient, and then to Applicationmaster register the application. After registration, you will be waiting for driver to start, and when driver is started, a Monitoractor object will be created for and Coarsegrainedschedulerbackend Communicate (only events addwebuifilter their communication, and the task's health is not communicated through it and Coarsegrainedschedulerbackend). Then it is set Addamipfilter, when the job is completed, Executorlauncher will set the application status to finalapplicationstatus.succeeded through Amclient.
7, the allocation of executors, which the distribution of logic and yarn-cluster inside similar, no longer said.
8. Finally, the task will run in Coarsegrainedexecutorbackend, and then the health will notify Coarsegrainedscheduler through Akka until the job runs.
9, in the operation of the time, Yarnclientschedulerbackend will be every 1 seconds through the client to obtain the operation of the job, and print out the corresponding operation information, when the state of application is finished, Failed and killed, then the program exits the wait.
10, finally a thread will reconfirm the state of the application, when the state of application is finished, failed and killed one, the program is completed, and stop Sparkcontext. The whole process is over.

Two ways to submit spark on yarn

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.