Go Brief introduction of Jstorm supervisor

Source: Internet
Author: User
Tags ack

First, Introduction

Supervisor is a working node in Jstorm, similar to the task scheduling result data of Mr Tt,subscribe zookeeper, which starts/stops worker processes according to task scheduling conditions. At the same time supervisor needs to write active port information to zookeeper periodically for nimbus monitoring. Supervisor does not perform specific processing work, all computational tasks are submitted to the worker to complete. From the whole architecture, supervisor is in the middle of the whole Jstorm three level management architecture, which assists the management task scheduling and resource management work.

II. Architecture 1. Supervisor


Supervisor the single-node architecture, as shown, initiates the process supervisor at initialization, triggering the start/stop worker JVM process based on Nimbus assigned tasks, where each worker process initiates one or more task threads, Where the task must belong to a single topology. Run multiple JVM processes from the entire Supervisor node, including one supervisor process and one or more worker processes.
Different role states are maintained in different ways. Where the task is directly written by HB to include the time information and the statistics of the current Task to Zookeeper;worker periodically will include the topology ID, the port, the Task ID collection, and the current time to write to the local Supervisor periodically writes the time and node resources (the port collection) to zookeeper, while reading the task schedule results from zookeeper, starting/deactivating worker processes based on the results.

2.Worker


Within the worker JVM process, the task thread shares common resources within worker processes, such as data delivery and connection management between nodes, in addition to the separate task threads. which
Virtualport: Data receive thread;
Keyotupleserialize:tuple of data serialization;
Transferqueue: Data transmission pipeline;
Drainerrunnable: Data sending thread;
Refreshconnections: Connections between nodes manage threads.

Third, implementation and Code Analysis 1. Supervisor

In jstorm-0.7.1, Supervisor Daemon is implemented in the Com.alipay.dw.jstorm.daemon.supervisor package under the Jstorm-server/src/main/java directory. Supervisor.java is the entrance of Supervisor Daemon, the Supervisor process mainly does the following several things.

Initialization

1, clean up the local temporary directory data $jstorm-local-dir/supervisor/tmp;
2, create the ZK operation instance;
3, the local new state file, $jstorm-local-dir/supervisor/localstate;
4, Generate Supervisor-id and write localstate, wherein key= "Supervisor-id", if supervisor restart, first check whether the Supervisor-id is already exist, if there is a direct read;
5, Initialize and start the heartbeat thread;
6, Initialize and start the syncprocessevent thread;
7, Initialize and start the syncprocessevent thread;
8. Register the main process to exit the data cleanup hook in Supervisormanger.

1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465 66
@SuppressWarnings ("Rawtypes") public Supervisormanger mksupervisor (Map conf, Mqcontext sharedcontext) throws Excepti    On {log.info ("Starting Supervisor with conf" + conf);    active = new Atomicboolean (true); /* * Step 1:cleanup All Files in/storm-local-dir/supervisor/tmp */String Path = Stormconfig.supervisortmpdir    (conf);    Fileutils.cleandirectory (The new File (path)); /* * Step 2:create ZK Operation instance * stromclusterstate * * stormclusterstate stormclusterstate = Clus    ter. mk_storm_cluster_state (conf); /* * Step 3, create Localstat * localstat is one KV database * 4.1 Create localstate instance * 4.2 Get Su    Pervisorid, if no supervisorid, create one */localstate localstate = stormconfig.supervisorstate (conf);    String Supervisorid = (string) localstate.get (common.ls_id);        if (Supervisorid = = null) {Supervisorid = Uuid.randomuuid (). toString (); Localstate.put (common.ls_id, Supervisorid);    Vector threads = new vector (); Step 5 Create HeartBeat//Every supervisor.heartbeat.frequency.secs, write supervisorinfo to ZK String Myhostnam    E = Networkutils.hostname ();    int starttimestamp = Timeutils.current_time_secs ();    Heartbeat HB = new Heartbeat (conf, stormclusterstate, Supervisorid, Myhostname, Starttimestamp, active);    Hb.update ();    Asyncloopthread heartbeat = new Asyncloopthread (HB, FALSE, NULL, thread.min_priority, true);    Threads.add (Heartbeat);    Step 6 Create and start sync Supervisor thread//Every supervisor.monitor.frequency.secs second run Syncsupervisor    EventManager Processeventmanager = new Eventmanagerimp (false);    Concurrenthashmap workerthreadpids = new Concurrenthashmap ();    Reads the value of key=local-assignments in $jstorm-local-dir/supervior/localstate, performing workers kill/start based on that value Syncprocessevent syncprocessevent = new Syncprocessevent (Supervisorid, Conf, Localstate, Workerthreadpids, SharEdcontext);    EventManager Syncsupeventmanager = new Eventmanagerimp (false); By comparing $zkroot/assignments/{topologyid} full-amount data and local Storm-local-dir/supervisor/stormdist/{topologyid}://1. Download the jar and configuration data//2 for topology that have tasks assigned to this node from Nimbus. Remove the jar and configuration data from the defunct topology locally syncsupervisorevent syncsupervisorevent = new S  Yncsupervisorevent (Supervisorid, Conf, Processeventmanager, Syncsupeventmanager, Stormclusterstate,    Localstate, syncprocessevent);    int syncfrequence = (Integer) conf. get (CONFIG.SUPERVISOR_MONITOR_FREQUENCY_SECS); Eventmanagerpusher syncsupervisorpusher = new Eventmanagerpusher (syncsupeventmanager, syncSupervisorEvent, act    Ive, syncfrequence);    Asyncloopthread syncsupervisorthread = new Asyncloopthread (syncsupervisorpusher);    Threads.add (Syncsupervisorthread);    Log.info ("Starting Supervisor with ID" + Supervisorid + "at host" + myhostname); Supervisormanger which can shutdown all supervisor and workers retUrn New Supervisormanger (conf, Supervisorid, active, threads, Syncsupeventmanager, Processeventmanager, STORMCL Usterstate, workerthreadpids);}
Heartbeat Threads

1, the default interval of 60s to zookeeper report supervisor information, report content packaged into Supervisorinfo, including hostname,workerports,current time and during time and other information;

12345678910111213
@SuppressWarnings ("unchecked") public void Update () {    Supervisorinfo sinfo = new Supervisorinfo (            Timeutils.current_time_secs (), Myhostname,            (List) conf.get (Config.supervisor_slots_ports), (            int) ( Timeutils.current_time_secs ()-startTime));    try {        stormclusterstate.supervisor_heartbeat (Supervisorid, sinfo);    } catch (Exception e) {        log.error (" Failed to update Supervisorinfo to ZK ", e);    }}
Syncprocessevent Threads

1. Read key= "local-assignments" data periodically from local file $jstorm-local-dir/supervisor/localstate; The data will be written periodically by the syncsupervisorevent thread;
2. Read the worker status data in local $jstorm-local-dir/worker/ids/heartbeat;
3. Compare local-assignments and worker status data, perform operation Start/kill worker process, where worker and supervisor belong to different JVM processes, Supervisor start worker with shell command:

12345678
Nohup java–server-djava.library.path= "$JAVA. LIBRARY. PATH "-dlogfile.name=" $topologyid-worker-$port. Log "-dlog4j.configuration=jstorm.log4j.properties-djstorm.home=" $JSTORM _home "-CP $JAVA _classspath: $JSTORM _classpathcom.alipay.dw.jstorm.daemon.worker.workertopologyid Supervisorid Port Workerid

The syncprocessevent thread execution process is as follows:

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748
@SuppressWarnings ("unchecked") @Overridepublic void Run () {log.debug ("Syncing processes"); try {/** * Step 1:get assigned tasks from Localstat Map *///1. From a local file $jstorm-local-dir/supe        Rvisor/localstate Read key= "local-assignments" data Map localassignments = null;        try {localassignments = (Map) localstate. Get (common.ls_local_assignments);            } catch (IOException e) {log.error ("Failed to get local_assignments from Localstate", e);        Throw e;        } if (localassignments = = null) {localassignments = new HashMap ();        } log.debug ("Assigned tasks:" + localassignments); /** * Step 2:get local workerstats from Local_dir/worker/ids/heartbeat * Map *///2. According to Loca        The lassignments and workers HB ratios result in a workers state Map localworkerstats = null;            try {localworkerstats = getlocalworkerstats (conf, localstate,        localassignments);            } catch (IOException e) {log.error ("Failed to get Local worker stats");        Throw e;        } log.debug ("Allocated:" + localworkerstats); /** * Step 3:kill Invalid Workers and remove killed worker from * localworkerstats *///3.        Start/deactivate related worker Set keepports = killuselessworkers (localworkerstats) based on the status value of workers;    Start new workers startnewworkers (Keepports, localassignments);        } catch (Exception e) {log.error ("Failed Sync Process", e); Throw e}}
Syncsupervisorevent Threads

1. Download all task scheduling results from $zk-root/assignments/{topologyid} and filter out the task collection assigned to the current supervisor, verifying that a single port is assigned only a single topology after the task passes, Writes the above task collection to the local file $jstorm-local-dir/supervisor/localstate, in order to syncprocessevent reads and subsequent operations;
2, compare the task assignment results with the existing topology, download the newly assigned topology from Nimbus, and delete the expired topology.
The syncsupervisorevent thread execution process is as follows:

12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061
@Overridepublic void Run () {log.debug ("Synchronizing supervisor");        try {runnablecallback synccallback = new Eventmanagerzkpusher (this, syncsupeventmanager);         /** * Step 1:get all assignments * and register/zk-dir/assignment and every assignment watch *        *///1. Get assignment Complete task set from ZK assignments: (Topologyid-Assignment)//$zkroot/assignments/{topologyid}        Map assignments = cluster.get_all_assignment (stormclusterstate, synccallback);        Log.debug ("Get all Assignments" + assignments); /** * Step 2:get topologyids list from * storm-local-dir/supervisor/stormdist/*///2. Local already                Downloaded topology collection $jstorm-local-dir/supervisor/stormdist/{topologyid} List downloadedtopologyids = StormConfig        . get_supervisor_toplogy_list (conf);        Log.debug ("Downloaded storm IDs:" + downloadedtopologyids); /** * Step 3:get from ZK Local Node's * Assignment * * * *//3. Filter out the task set assigned to the current supervisor from assignments Map localassignment = Getlo        Calassign (Stormclusterstate, Supervisorid, assignments); /** * Step 4:writer Local Assignment to Localstate *///4. Write the results from step 3 to a native file $jstorm-local-dir/supervis            Or/localstate try {log.debug ("Writing local assignment" + localassignment);        Localstate.put (common.ls_local_assignments, localassignment); } catch (IOException e) {log.error ("put ls_local_assignments" + localassignment + "of Loc            Alstate failed ");        Throw e; }//Step 5:download code from ZK//5. Download the newly assigned task topology Map topologycodes = Gettopologycodelocations (        assignments);        Downloadtopology (Topologycodes, downloadedtopologyids); /** * Step 6:remove any downloaded useless topology */6. Remove topology Removeuselesstopol for overdue tasks Ogy (tOpologycodes, Downloadedtopologyids);    /** * Step 7:push syncprocesses Event */Processeventmanager.add (syncprocesses);        } catch (Exception e) {log.error ("Failed to Sync Supervisor", e);    throw new RuntimeException (e); }}
2.Worker

In jstorm-0.7.1, the Worker daemon implements the Com.alipay.dw.jstorm.daemon.worker package in the Jstorm-server/src/main/java directory. Where Worker.java is the entrance to the worker daemon. Life cycle of worker processes:
1, initialize the tuple serialization function and data sending pipeline;
2. Create tasks assigned to the current worker;
3, Initialize and start receiving tuple dispatcher;
4. Initialize and start to maintain the inter-worker connection thread refreshconnections, including the creation/maintenance/destruction of connections between nodes and other functions;
5. Initialize and start the heartbeat thread workerheartbeatrunable, update the local directory: $jstorm _local_dir/worker/{workerid}/heartbeats/{workerid};
6, initialize and start sending tuple thread drainerrunable;
7. The main thread of registration exits the field data cleaning hook.
The Worker daemon initialization process is as follows:

123456789101112131415161718192021222324252627282930313233343536
Public Workershutdown Execute () throws Exception {//1.    Tuple serialization + Send pipeline linkedblockingqueue workertransfer workertransfer = Getsendingtransfer (); Shutdown task callbacks//2.    Initializes the task thread List shutdowntasks = Createtasks (Workertransfer);    Workerdata.setshutdowntasks (Shutdowntasks); 3.  Workervirtualport:tuple Receive Dispatcher//Create virtual Port object//When worker receives Tupls, dispatch Targettask According to TASK_ID//CONF, Supervisorid, Topologyid, Port, Mqcontext, taskids workervirtualport virtual_port = n    EW Workervirtualport (Workerdata);    Shutdownable Virtual_port_shutdown = Virtual_port.launch (); 3. Refreshconnections: Maintaining connections between nodes: Creating a New Connection | Maintaining established connections | Destroying useless connections//refresh connection refreshconnections Refreshconn = MAKEREFRESHC    Onnections ();    Asyncloopthread refreshconn = new Asyncloopthread (refreshconn);    Refresh ZK Active status refreshactive refreshzkactive = new Refreshactive (workerdata); Asyncloopthread REFRESHZK = new Asyncloopthread(refreshzkactive); 4. Workerheartbeatrunable: Heartbeat thread//per heartbeat update local directory data $LOCAL _path/workers/{worker-id}/heartbeats/{worker-id}//Refresh Hearbe    At to Local dir runnablecallback heartbeat_fn = new workerheartbeatrunable (workerdata);    Asyncloopthread HB = new Asyncloopthread (HEARTBEAT_FN, False, NULL, thread.norm_priority, true); 5. Drainerrunable: Send a tuple thread//Transferqueue, Nodeportsocket, tasknodeport drainerrunable drainer = new Drainerrunable (    Workerdata);    Asyncloopthread dr = new Asyncloopthread (drainer, False, NULL, thread.max_priority, true);    asyncloopthread[] Threads = {refreshconn, REFRESHZK, HB, Dr}; 6. Register main thread exit data cleanup hook return new Workershutdown (Workerdata, Shutdowntasks, Virtual_port_shutdown, threads);}
3.Task

Depending on the task's different node roles in the topology, the task is also divided into Spouttask and Bolttask, which have independent processing logic in addition to the same task heartbeat and public data initialization. The core implementation is in Spoutexecutors.java/boltexecutors.java.

Spoutexecutors mainly do two things:
1, as the DAG starting point, responsible for sending the original tuple data;
2, if the topology defines the acker,spoutexecutors will start the receiving ACK thread, based on the ACK received to decide whether to re-send a tuple;

Boltexecutor is slightly more complex than the Spoutexecutor function:
1, receive from upstream sent over the tuple, and according to the processing logic defined in the topology processing;
2, if the bolt exists downstream, it is necessary to send the newly generated tuple downstream;
3. If Acker,bolt is defined in topology, a simple computed ACK will be returned to the root spout.

Iv. Conclusion

This paper introduces the work of Supervisor/worker/task in the whole jstorm and the source code analysis of its implementation logic and key process, which inevitably exist deficiencies and errors, and welcome the communication guidance.

V. References

[1] Storm community. http://Storm.incubator.apache.org/
[2] Jstorm source code. https://github.com/alibaba/jStorm/
[3] Storm source. https://github.com/nathanmarz/Storm/
[4] Jonathan Leibiusky, Gabriel Eisbruch, etc. Getting Started with Storm.http://shop.oreilly.com/product/0636920024835.do. O ' Reilly Media, Inc.
[5] Xumingming Blog. http://xumingming.sinaapp.com/
[6] Quantum Heng DAO official blog. http://blog.linezing.com/

Go Brief introduction of Jstorm supervisor

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.