Go Brief introduction of Jstorm Nimbus

Source: Internet
Author: User
Tags zookeeper

First, Introduction

The Jstorm cluster consists of two types of nodes: the Master node (Nimbus) and the Work node (Supervisor). The respective roles are as follows:
1. Run Nimbus Daemon on the main control node (Nimbus). Nimbus is responsible for receiving client submitted topology, distributing code, assigning tasks to work nodes, and monitoring the status of running tasks in the cluster. The Nimbus function is similar to Jobtracker in Hadoop.
2. Run Supervisor Daemon on the Work node (Supervisor). Supervisor through the Subscribe zookeeper related data monitoring Nimbus assigned tasks, thereby starting or stopping the worker worker process. Each worker worker process performs a subset of a topology task, and the task of a single topology is handled collaboratively by worker worker processes that are distributed across multiple worker nodes.

Coordination between Nimbus and supervisor nodes is achieved through zookeeper. In addition, Nimbus and supervisor themselves are stateless processes that support the status information of the Fail fast;jstorm cluster node or are stored in zookeeper, or persisted to local, which means that even if nimbus/supervisor is down, You can continue to work after rebooting. This design allows the Jstorm cluster to have very good stability.

The node state information in Jstorm is saved in zookeeper, Nimbus by assigning tasks to zookeeper write status information, supervisor by subscribing to data from zookeeper to collect tasks, At the same time supervisor also send heartbeat information to zookeeper regularly, so that Nimbus can master the status of the whole jstorm cluster, so that the task can be scheduled or load balanced. Zookeeper makes the entire jstorm cluster very robust, and any node outage does not affect the cluster task, as long as the node can be restarted.

The state data stored on the zookeeper and the Nimbus/supervisor local persisted data are involved in a lot of places, the details of Nimbus before the storage structure of the above data is briefly described as follows (note: reference from [5]http://xumingming.sinaapp.com/).

Figure 1 Jstorm stored in Zookeeper data description

Figure 2 Nimbus Local Data description

Figure 3 Supervisor Local Data description

Second, system architecture and principles

Nimbus do three things:
1, receive the client to submit topology task;
2, task scheduling;
3. Monitor the operation status of cluster tasks.

As already mentioned, Nimbus through the writing of data to zookeeper to complete the task assignment, through the read zookeeper on the relevant state information monitoring the operation of the task in the cluster, so the direct interaction with Nimbus only client and zookeeper. as shown.

Iii. Implementing Logic and Code Analysis

Take jstorm-0.7.1 as an example, Nimbus related implementation in the Jstorm-server/src/main/java directory Com.alipay.dw.jstorm.daemon.nimbus package. Nimbus Daemon's starting entrance is in Nimbusserver.java.

1.Nimbus Boot

The Nimbus daemon process START process is as follows:
1, according to the configuration file to initialize the context data;
2, with zookeeper data synchronization;
3, initialize the RPC service processing class Servicehandler;
4. Start the task assignment policy thread;
5, start task heartbeat monitoring thread;
6, start the RPC service;
7, other initialization work.
The detailed startup logic for Nimbus is as follows:

123456789101112131415161718192021222324
@SuppressWarnings ("rawtypes") private void Launchserver (Map conf) throws Exception {    log.info ("Begin to start Nimbus With conf "+ conf);    1. Check whether the configuration file is configured as distributed Mode    stormconfig.validate_distributed_mode (conf);    2. Register the main thread exit hook field Cleanup (Close thread + cleanup data)    Initshutdownhook ();    3. New Nimbusdata data, record 30s time-out upload download channel channel/bufferfileinputstream    data = createnimbusdata (conf);    4.nimbus Stormids data that does not exist locally is deleted if it exists on ZK, where the delete operation includes/zk/{assignments,tasks,storms} related data    Nimbusutils.cleanupcorrupttopologies (data);    5. Start topology allocation Strategy    inittopologyassign ();    6. Initialize all topology with the status of Startup    Inittopologystatus ();    7. Monitor all task heartbeat, once found TaskID lost heartbeat will be set to Needreassign 1 times/10s    initmonitor (conf);    8. Start The Cleaner thread, the default 600s scan once, the default delete 3600s did not read and write Jar file    initcleaner (conf);    9. Initialize Servicehandler    servicehandler = new Servicehandler (data);    10. Start RPC Server    initthrift (conf);}
2.Topology Submission

After the Jstorm cluster is started, the client submits the topology to it. jstorm-0.7.1 Source Directory Jstorm-client/src/main/java Backtype.storm provides the Stormsubmitter.submittopology method for users to submit topology to the cluster. Submit topology at both ends of the Client/nimbus will do the relevant processing.

Client-side commit topology is done in two steps:
1) package topology compute logic code jar submitted to Nimbus, upload to Nimbus directory $jstorm_local_dir/nimbus/inbox/stormjar-{$randomid}.jar , where Randomid is a random uuid generated by Nimbus;
2) client submits topology dag and configuration information to Nimbus via RPC;

12345678910111213141516171819202122232425262728293031323334
public static void Submittopology (String name,map stormconf,stormtopology topology) throws Alreadyaliveexception, invalidtopologyexception {if (!  Utils.isvalidconf (stormconf)) {throw new IllegalArgumentException ("Storm conf is not valid.");  } stormconf = new HashMap (stormconf);  Stormconf.putall (Utils.readcommandlineopts ());  Map conf = Utils.readstormconfig ();  Conf.putall (stormconf);      try {String serconf = jsonvalue.tojsonstring (stormconf);          if (localnimbus!=null) {Log.info ("submitting Topology" + name + "in local mode");      Localnimbus.submittopology (name, NULL, serconf, topology);          } else {//1. Submit jar Package Submitjar (conf) to Nimbus;          Nimbusclient client = nimbusclient.getconfiguredclient (conf);              try {Log.info ("submitting Topology" + name + "in distributed mode with conf" + serconf); 2. Submit the topology Dag and the serialized configuration information serconf (JSON) client.getclient (). Submittopology (Name, SubmittedjaR, serconf, topology);          } finally {client.close ();  }} log.info ("Finished Submitting topology:" + name);  } catch (Texception e) {throw new RuntimeException (e); }}

Where RPC and data serialization are implemented through the cross-language service Framework Thrift (http://wiki.apache.org/thrift/). The Jstorm service is defined in Other/storm.thrift.

The Nimbus side receives the topology Compute Logic code jar package that is submitted by the client, and the jar package is temporarily present in the directory $jstorm_local_dir/nimbus/inbox/stormjar-{as described earlier $randomid}.jar ;
After the Nimbus end receives the topology DAG and configuration information that the client submits:
1) Simple legality check, main check whether there is the same topologyname topology, if present, reject topology commit.
2) generate Topologyid; Generate rule: topologyname-counter-currenttime;
3) Serialization of configuration files and topology code;
4) Nimbus The data required to prepare the operation locally;
5) Register the topology and task with the zookeeper;
6) The tasks are pressed into the allocation queue waiting for topologyassign allocation;

12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455
@SuppressWarnings ("unchecked") @Overridepublic void Submittopology (String topologyname, String uploadedjarlocation,     String jsonconf, Stormtopology topology) throws Alreadyaliveexception, Invalidtopologyexception, texception {...    try {//1. Detects if the topologyname already exists, and if topology with the same name refuses to commit checktopologyactive (data, topologyname, false); } ...//2. Based on the Topologyname construct Topologyid (=topologyname-$counter-$ctime) int counter = Data.getsubmittedcount (). Increme    Ntandget ();    String Topologyid = topologyname + "-" + Counter + "-" + timeutils.current_time_secs ();    3. jsonconf recombination configuration data according to input parameters Map serializedconf = (map) Jstormutils.from_json (jsonconf);    if (serializedconf = = null) {throw new Invalidtopologyexception ("");    } serializedconf.put (config.storm_id, Topologyid);    Map stormconf;    try {stormconf = nimbusutils.normalizeconf (conf, serializedconf, topology);    } catch (Exception E1) {throw new texception (errmsg); } MaP totalstormconf = new HashMap (conf);    Totalstormconf.putall (stormconf);    Stormtopology newtopology = new Stormtopology (topology); 4. Check the legality of the topology, including ComponentID inspection and Spout/bolt cannot be empty check//This validates the structure of the topology Common.validate_    Basic (Newtopology, totalstormconf, Topologyid);        try {stormclusterstate stormclusterstate = Data.getstormclusterstate (); 5. Prepare all topology related data locally in Nimbus//including $storm-local-dir/nimbus/stormdist/topologyid/{tormjar.jar,stormcode.ser, Stormconf.ser}//Create $storm-local-dir/nimbus/topologyid/xxxx files setupstormcode (conf, Topologyid, Uplo        Adedjarlocation, stormconf, newtopology); 6. Write task information to ZK//6.1 new directory $zkroot/taskbeats/topologyid//6.2 write file $zkroot/tasks/topologyid/taskid content as task of corresponding task        info[content: ComponentID]//Generate taskinfo for every bolts or spout in ZK//$zkroot/tasks/topoologyid/xxx Setupzktaskinfo (conf, Topologyid, stormclusterstate);        7. Task assignment events pressed into queue to be allocated//make assignments for a topology topologyassignevent assignevent = new Topologya        Ssignevent ();        Assignevent.settopologyid (Topologyid);        Assignevent.setscratch (FALSE);        Assignevent.settopologyname (Topologyname);    Topologyassign.push (assignevent); } ......}
3. Task Scheduling

After the topology is successfully committed, it presses into the Topologyassign FIFO queue in the Nimbus, and the background Task Scheduler thread dispatches the task to the topology in the queue one by one.
Starting with 0.9.0, Jstorm provides a very powerful scheduling function that basically satisfies most of the requirements while supporting custom task scheduling strategies. The resources of Jstorm are no longer just the port of the worker, but are considered synthetically from the four dimensions of cpu/memory/disk/net.
jstorm-0.7.1 's task scheduling policy is still mostly scheduled on the worker Port/net single dimension.

Task scheduling needs to solve the problem is: How to match the topology Dag in the various compute nodes and cluster resources in order to play an efficient logical processing. 0.7.1 's strategy is to:
1, sorting the resources in the cluster: According to the number of idle workers from small to large in order to rearrange the nodes, the internal nodes in accordance with the port size order;
2. Tasks that need to be assigned in topology (most tasks no longer need to be allocated when reassigning topology) are mapped to the above-mentioned resources.
The core logic of task scheduling is as follows:

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051
public static List Sortslots (Set allslots, int needslotnum) {map> nodemap = new hashmap> (); GROUP by first//organized by node map>: Nodeid-ports for (Nodeport np:allslots) {String node = Np.get        Node ();        List List = Nodemap.get (node);            if (list = = null) {list = new ArrayList ();        Nodemap.put (node, list);    } list.add (NP);        } //Each Nodeid is sorted by port size for (entry> Entry:nodeMap.entrySet ()) {List ports = Entry.getvalue ();    Collections.sort (ports); } //Collect all workers list> Splitup = new arraylist> (Nodemap.values ()); //Sort from small to large by node available number of worker//1        . Assigntasks-map Supinfos//2.availslots:splitup/list> collections.sort (Splitup, New Comparator> () {        public int Compare (list O1, list O2) {return o1.size ()-o2.size (); }}); /* * Splitup current status (a-f represents node, 1-h represents port) * | a| | b| | c| | d| | e| |     F| *--|---|---|---|---|---|--* |1| |2| |3| |4| |5| |6| * |7| |8|     |9| |0| |a|     * |b| |c| |d| |e|     * |f| |g|     * |h| * Sortedfreeslots collected by Interleave_all is: * 1-2-3-4-5-6-7-8-9-0-a-b-c-d-e-f-g-h */List sortedfreeslots = JStormU Tils.interleave_all (Splitup); //Compare sortedfreeslots.size and needslotnum size allocations workers if (sortedfreeslots.size () NE Edslotnum return sortedfreeslots.sublist (0, needslotnum);}
4. Task Monitoring

When Nimbus is initialized, a thread called monitorrunnable is started by the background, and the role of the thread is to periodically check for the presence dead of all tasks running topology. Once you discover that the dead task exists in topology task,monitorrunnable the topology to Statustype.monitor, wait for the task assignment thread to reassign the topology task in that dead.
Monitorrunnable thread Default 10s performs a check, the main logic is as follows:

123456789101112131415161718192021222324252627282930313233343536373839404142434445
@Overridepublic void Run () {//1. Get Jstorm to ZK operation interface Stormclusterstate clusterstate = Data.getstormclusterstate (); try {//attetion, here don ' t check/zk-dir/taskbeats to//Get active Topology List//2. by $zkroot/as Signments/gets all the need to check for active topology List Active_topologys = clusterstate.assignments (null);  if (active_t            Opologys = = null) {Log.info ("Failed to get active topologies");        Return            }  for (String topologyid:active_topologys) {log.debug ("Check tasks" + Topologyid); Attention, here's don ' t Check/zk-dir/taskbeats/topologyid to//Get task IDs//3. by $zkroot/tasks/            Topologyid gets all tasks that make up topology List Taskids = Clusterstate.task_ids (Topologyid);                if (Taskids = = null) {Log.info ("Failed to get task IDs of" + Topologyid);            Continue            }  Boolean needreassign = false;for (Integer task:taskids) {//4. Check if the task is a dead state, mainly whether the task heartbeat timed out Boolean istaskdead = Nimbus                Utils.istaskdead (data, Topologyid, Task);                    if (Istaskdead = = true) {needreassign = true;                Break }} if (needreassign = = True) {//5. If a task with dead status in topology topology status is set to monitor waiting for task assignment            Thread redistribution nimbusutils.transition (data, Topologyid, false, Statustype.monitor);    }}} catch (Exception e) {//TODO auto-generated catch block Log.error (E.getcause (), E); }}
Iv. Conclusion

This paper briefly introduces the role played by Nimbus in the whole jstorm system, and the source code analysis of its logic and key process, hoping to be helpful to the students who have just contacted Jstorm. There are inevitably deficiencies and errors in the text, Welcome to exchange guidance.

V. References

[1] Storm community. http://Storm.incubator.apache.org/
[2] Jstorm source code. https://github.com/alibaba/jStorm/
[3] Storm source. https://github.com/nathanmarz/Storm/
[4] Jonathan Leibiusky, Gabriel Eisbruch, etc. Getting Started with Storm.http://shop.oreilly.com/product/0636920024835.do. O ' Reilly Media, Inc.
[5] Xumingming Blog. http://xumingming.sinaapp.com/
[6] Quantum Heng DAO official blog. http://blog.linezing.com/

Go Brief introduction of Jstorm Nimbus

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.