I. Introduction of Azkaban
As an open-source dispatch system, Azkaban is widely used in big data. It mainly consists of three parts: Azkaban Webserver, Azkaban Executor, and DB.
Figure 1 Azkaban Architecture
Figure 1 shows the basic architecture of Azkaban: webserver is mainly responsible for authorization, project management, job shedding, etc. executor is mainly responsible for the job flow/job execution and the collection of execution logs, etc. mysql is used to store execution state information for job/job flows. The diagram shows a single executor scenario, but most of the projects in the application use a multi-executor scenario. The following mainly introduces the Azkaban scheduling process under multi-executor scenarios.
Second, the job flow execution process
Figure 2 Job Flow execution process
Figure 2 shows the execution process of the Azkaban job flow:
1. First webserver the resource state of each executor based on the in-memory cache (webserver has a thread that traverses each active executor to send an HTTP request to get its resource state information cached in memory). Select a executor to distribute the job flow according to the selection strategy (including executor resource status, number of recently executed flows, etc.);
2. Then executor determine whether to set the job granularity allocation, if the job granularity allocation is not set, then all jobs are executed in the current executor;
3. If the job granularity allocation is set, the current node becomes the decision maker of the assignment, i.e. the allocation node;
4. The allocation node obtains the resource status information of each executor from zookeeper, then selects a executor assignment job according to the policy;
5. The executor assigned to the job becomes the execution node, executes the job, and then updates the database.
Third, from the source view job flow execution process is first
WebserverEnd:
1. The Executorservlet class is judged according to the Ajax parameters of the request, if Ajax=executeflow, go to Ajaxattemptexecuteflow (req, resp, ret, Session.getuser ()) method
2. In the Ajaxattemptexecuteflow method, first the Getprojectajaxbypermission method is used to determine whether the user has execute permissions, if the authentication permission is passed, and project and flow exist, Just adjust the Ajaxexecuteflow method
3. The main function of the Ajaxexecuteflow method is to construct the Executableflow object, set the execution parameters (notification mechanism, concurrency, failure policy), and then go to tune the Executormanager.submitexecutableflow method
4. Executormanager.submitexecutableflow Method: Determine the execution strategy (pipelining, ignore, concurrency); if it is multi-node mode, the job stream is submitted to the execution queue, and if it is a single execution node mode, Select the unique execution node to distribute the job stream.
5. The Executormanager.submitexecutableflow () method is the main implementation logic of the webserver end of the job flow, the following focus on its content:
5.1 Get the job stream's Flowid (the name of the job stream) from the Exflow instance, and hit the log ("Start submitting stream xxx by").
5.2 Determine if the queuedflows is full, if the log is full ("Submission failed, Azkaban too saturated"), return, if not full, continue to execute code
5.3 Gets the ID of all running instances of the job stream, list<integer> running
5.4 Get Execution Settings options
5.5 Get the execution parameters of the stream from the execution settings options (whether enable, the parameter is valid)
5.6 Determines whether the running is empty, if it is empty, that is, no concurrent instances are running
5.7 If running is not empty, get concurrency settings getconcurrentoption ()
5.7.1 assembly line (pipeline): Sets the last instance ID of the Pipelineexcutionid to be submitted in running
5.7.2 Ignore (Skip): Throw exception, "stream is already executing, ignore this execution"
5.7.3 Concurrency: Modify log only
5.8 Based on whitelist setting whether Memorycheck
5.9 Executorloader.uploadexecutableflow (Exflow) writes the database table execution_flows with a status of preparing
5.10 Constructing a concrete execution instance executionreference
5.11 Determines whether to execute node mode more, if not, marks the state of the execution stream as active, that is, writes the database table Active_executing_flows, dispatch the stream to the unique execution node execution.
5.12 If it is a multiple-execution node pattern, the state of the execution stream is marked as active, and the stream is placed in the execution queue queuedflows.
6. In the case of multi-node mode, the Executormanager class will be Setupmultiexecutormode () method in the constructor, which will build a thread to continue consuming the first job flow in the queue through the Processqueuedflows method. The main content of Processqueuedflows method is to refreshexecutors refresh the resource information of execution node according to certain rules. And Selectexecutoranddispatchflow Select a executor to distribute the job flow from the activeexecutors according to the policy. The Refreshexecutors () method essentially iterates through each active executor, sending a request for state information, rather than passing through the zookeeper.
At this point, the webserver end of the work has been completed.
Then the executor side:
1. The execution flow reaches the executor end, at which point the state in the database is already preparing
2. The Executorservlet class is judged on the requested action parameter, and if Action=execute, Handleajaxexecute (req, Respmap, Execid) method
3. Execute flowrunnermanager.submitflow (execid)in the Handleajaxexecute method to Flowrunnermanager Submitflow (execId) method to commit the execution flow.
4. Two important data structures for Flowrunnermanager:
4.1 map<future<?>, integer> submittedflows = new Concurrenthashmap<future<?>, Integer> ();
4.2 Map<integer, flowrunner> runningflows = new Concurrenthashmap<integer, flowrunner> ();
The submittedflows is used to track the execution of all current executor streams in the preparing state, runningflows for the current executor of all currently executing streams, when required to execute cancling () or killing () to find these streams.
5. The flowrunnermanager.submitflow (execid) method is the main implementation logic for the executor execution of the job flow, with the following emphasis on its contents:
5.1 First determine whether runningflows contains the corresponding instance of the Execid, if it is already included, throw an exception
5.2 from Executorloader to get execid corresponding execution instance (Executableflow) flow
5.3 Performing Setupflow (flow), configuring flow: Creating a project and executing a directory, etc.
5.4 Get execution Settings executionoptions
5.5 Determines whether the pipelineexecid is null. If it is not NULL, it is determined that the pipelineexecid corresponding Flowrunner is not in the runningflows. If in Runningflows, a localflowwatcher to monitor the execution status of each job in flow, if not in 5.6 runningflows, a remoteflowwatcher to monitor, The state of each job in the stream is monitored by reading the records of the database at a certain time (by default, 60 seconds)
5.7 Determine if the execution parameter contains flow.num.job.threads, and if present and less than the default value of 10, modify the value. This value represents the number of job threads that the stream can execute concurrently.
5.8 Constructs a new Flowrunner instance runner
5.9 Configureflowlevelmetrics (runner) configuration runner
5.10 again to determine whether runningflows contains the corresponding execution instance of the Execid, if included, throw exception
5.11 Add runner to map of Runningflows
5.12 Submit to Trackingthreadpool (worker thread pool)
5.13 adding a map to Submittedflows
6. Since then, we have a flowrunner example, below we see what is done in Flowrunner.
Flowrunner is actually a thread, and the contents of its run () method are as follows:
6.1 Executors.newfixedthreadpool (numjobthreads) Create flow internal job thread pool flow
6.2 Setupflowexecution ()
6.3 Updateflowreference ()
6.4 Updateflow () Update flow status information, write database table execution_flows
6.5 loadallproperties () load job parameters and shared parameters
6.6 Determines whether the input parameter contains job.dispatch (job granularity allocation), and if it contains and is true, a new thread jobeventupdaterthread is used to track the execution status of each job under the job.
6.7 Executive Runflow ()
6.8 Runflow () method: Executes the job sequentially according to the algorithm of the DAG graph. From the start node of the stream, recursively calls Runreadyjob () to execute the job, and then Updateflow (); If the stream is not over, decide whether to re-run the failed job, depending on the retry settings.
6.9 in Runreadyjob () the Runexecutablenode (node) method is adjusted, The Runexecutablenode method then judges the Job.dispatch parameter, and if False, executes locally through Localjobrunner, and if true, submits the job through Jobrunnermanager.
6.10 Jobrunnermanager constructs Remotejobrunner,remotejobrunner by Submitexecutablenode method to select a node to execute the job based on the resource state of each execution node (including this node).
Finally, the entire process can be summed up as a graph, as shown in:
Figure 3 Job flow execution process from Source view
Conclusion:
The first time to take the background test, found that background testing and front-end testing can be said to be completely different types of testing, it requires testers to develop the code has a certain degree of familiarity. If you are unsure of the implementation process, you cannot design comprehensive test cases to accurately estimate the degree of risk to the entire project on-line. In the case of anomaly testing, we typically perform exception testing mainly covering network anomalies, process anomalies (suspended animation, suspended animation recovery), server anomalies (downtime or strong kill process), and so on. However, even if you design these use cases, it is not comprehensive if you do not combine the entire business process. For example, the process of suspended animation recovery, when executor suspended animation, webserver need a certain period to detect executor hanging off, in the cycle of recovery and recovery in the cycle, the result is not the same. There are also use cases where some process execution times are very short and difficult to reproduce, and it is necessary to break points in the code through remote debugging to ensure that the test cases can be overwritten. In addition, under the condition that the task is heavy and the time is tight, the test personnel can not achieve full coverage, must accurately grasp the test focus, and formulate an effective testing strategy to ensure that the project reliable on-line.
From the source to see the process of Azkaban job flow--Remember my first white box test