---synchronization problem of storm pit

Source: Internet
Author: User

Recently, a monitoring system is being used to monitor the number of calls and processing times of various business functions on the site, so that problems can be identified and processed in a timely manner. To do this kind of real-time statistical processing system, the natural first thought of storm, so now learn to use, naturally encountered some pits, and many are difficult to find on the Internet. Here's a record of the most troubling mistakes I've made.

First of all, my business logic is to count the number of calls in a minute by the minute, so I ran a timer in the bolt, timed to send statistics to the next bolt storage. Where I called the Outputcollector in the code of the timer execution to launch to the next bolt. Local debugging no problem, deploy to the extranet environment test. The problem is usually not found, but occasionally this error occurs, and what is most annoying for developers is the low reproducibility error.

Here is the error log:

5675 [thread-7-disruptor-executor[2 2]-send-queue] ERROR backtype.storm.daemon.executor-java.lang.runtimeexception : Java.lang.NullPointerExceptionat Backtype.storm.utils.DisruptorQueue.consumeBatchToCursor (Disruptorqueue.java : +) ~[storm-core-0.9.3.jar:0.9.3]at backtype.storm.utils.DisruptorQueue.consumeBatchWhenAvailable ( disruptorqueue.java:99) ~[storm-core-0.9.3.jar:0.9.3]at Backtype.storm.disruptor$consume_batch_when_ Available.invoke (disruptor.clj:80) ~[storm-core-0.9.3.jar:0.9.3]at backtype.storm.disruptor$consume_loop_star_$ Fn__1460.invoke (disruptor.clj:94) ~[storm-core-0.9.3.jar:0.9.3]at Backtype.storm.util$async_loop$fn__464.invoke ( util.clj:463) ~[storm-core-0.9.3.jar:0.9.3]at Clojure.lang.AFn.run (afn.java:24) [Clojure-1.5.1.jar:na]at Java.lang.Thread.run (thread.java:722) [na:1.7.0_15]caused By:java.lang.NullPointerException:nullat Clojure.lang.RT.intCast (rt.java:1087) ~[clojure-1.5.1.jar:na]at backtype.storm.daemon.worker$mk_transfer_fn$fn__ 3549.invoke (worker.clj:129) ~[stOrm-core-0.9.3.jar:0.9.3]at backtype.storm.daemon.executor$start_batch_transfer__gt_worker_handler_bang_$fn__ 3283.invoke (executor.clj:258) ~[storm-core-0.9.3.jar:0.9.3]at backtype.storm.disruptor$clojure_handler$reify__ 1447.onEvent (disruptor.clj:58) ~[storm-core-0.9.3.jar:0.9.3]at Backtype.storm.utils.DisruptorQueue.consumeBatchToCursor (disruptorqueue.java:125) ~[storm-core-0.9.3.jar:0.9.3] ... 6 common frames omitted5697 [thread-7-disruptor-executor[2 2]-send-queue] ERROR backtype.storm.util-halting process: (" Worker died ") Java.lang.RuntimeException: (" worker died ") at Backtype.storm.util$exit_process_bang_.doinvoke ( util.clj:325) [Storm-core-0.9.3.jar:0.9.3]at Clojure.lang.RestFn.invoke (restfn.java:423) [Clojure-1.5.1.jar:na]at Backtype.storm.daemon.worker$fn__3808$fn__3809.invoke (worker.clj:452) [Storm-core-0.9.3.jar:0.9.3]at Backtype.storm.daemon.executor$mk_executor_data$fn__3274$fn__3275.invoke (executor.clj:240) [ Storm-core-0.9.3.jar:0.9.3]at backtype.storm.util$async_loop$fn__464. Invoke (util.clj:473) [Storm-core-0.9.3.jar:0.9.3]at clojure.lang.AFn.run (afn.java:24) [Clojure-1.5.1.jar:na]at Java.lang.Thread.run (thread.java:722) [na:1.7.0_15]

 If you encounter this problem, it must be painful for you to see this error for the first time because there are no records in the error log that are related to your business code. So it is impossible to locate the problem. The pain is not so easy to reproduce.

After many guesses I tried, finally measured the problem. Let me first post an example code that will report this error:

public class Main {public static void main (string[] args) {Topologybuilder builder = new Topologybuilder (); Builder.setspou T ("spout", New Testwordspout ()), Builder.setbolt ("Dispatch", New Worddispatchbolt ()). shufflegrouping ("spout"); Builder.setbolt ("Print", New Printbolt ()). Fieldsgrouping ("Dispatch", New fields ("word"));        Config conf = new config (); Conf.setdebug (false); Conf.setnumworkers (1);//conf.put (config.topology_max_spout_pending , 1); Localcluster cluster = new Localcluster () cluster.submittopology ("Test-kafka-1", conf, Builder.createtopology ());}}

  

public class Testwordspout extends Baserichspout {private static final long serialversionuid = 1L;    Boolean _isdistributed;    Spoutoutputcollector _collector;    string[] words = new string[] {"Nathan", "Mike", "Jackson", "Golda", "Bertels"};    Public Testwordspout () {this (true);    } public Testwordspout (Boolean isdistributed) {_isdistributed = isdistributed; } public void Open (Map conf, topologycontext context, Spoutoutputcollector collector) {_collector = col    Lector;        } public void Close () {} public void Nexttuple () {utils.sleep (1000);        Final random rand = new Random ();        Final String word = Words[rand.nextint (words.length)];    _collector.emit (New Values (word), word+new Random (). nextdouble ());    } public void Ack (Object msgId) {System.out.println ("# # ACK:" +msgid);    } public void fail (Object msgId) {System.out.println ("# # # Fail:" +msgid); } public void Declareoutputfields (Outputfieldsdeclarer declarer) {Declarer.declare (New fields ("word")); }

  

public class Worddispatchbolt extends baserichbolt{private outputcollector collector; @Overridepublic void Prepare (Map stormconf, Topologycontext context,outputcollector collector) {this.collector = Collector;new Thread (new Runnable () {@ overridepublic void Run () {while (true) {send ();//Do not sleep sleep, otherwise the chance of throwing this exception is too small, not easy to observe}}). Start (); public void Send () {This.collector.emit (new Values (New Random (). nextdouble ()));} @Overridepublic void Execute (Tuple input) {String word = Input.getstringbyfield ("word"); This.collector.emit (New Values (word)); This.collector.ack (input);} @Overridepublic void Declareoutputfields (Outputfieldsdeclarer declarer) {Declarer.declare (New fields ("word"));}}

  

public class Printbolt extends Baserichbolt {private static final long serialversionuid = 1L; @Overridepublic void Prepare ( Map stormconf, Topologycontext context,outputcollector collector) {} @Overridepublic void execute (Tuple input) { System.out.println (input.getvalue (0));} @Overridepublic void Declareoutputfields (Outputfieldsdeclarer declarer) {}}

 This code is very simple, do not do a detailed introduction. In the Worddispatchbolt Class I started another thread to launch the data to the next bolt. Like this in my business code, it is time to send data through a timer (the timer is actually a thread, not much to say). But the timer is called by the minute, so there is a small chance of a problem, here I deliberately 0 pause call, let this exception occur more chance.

If you run the above example code, you must have encountered a bug that was posted in front of you. If you do not know is Outputcollector synchronization problem, believe that the solution is absolutely painful. Now that you know it is a synchronization problem, either avoid calling collector in another thread, or change to synchronous. Here are the solutions that I simply think of. (If there is a great God and better, hope message)

Make the following modifications to the Worddispatchbolt class:

public class Worddispatchbolt extends baserichbolt{private outputcollector collector; @Overridepublic void Prepare (Map stormconf, Topologycontext context,outputcollector collector) {this.collector = Collector;new Thread (new Runnable () {@ overridepublic void Run () {while (true) {Send (new Values (New Random () nextdouble ()));//Do not sleep sleep, otherwise the chance of throwing this exception is too small, not easy to observe} }}). Start ();} public synchronized void Send (list<object> tuple) {this.collector.emit (tuple);} @Overridepublic void Execute (Tuple input) {String word = Input.getstringbyfield ("word"); Send (new Values (word)); This.collector.ack (input);} @Overridepublic void Declareoutputfields (Outputfieldsdeclarer declarer) {Declarer.declare (New fields ("word"));}}

Here, this pit is basically solved. After that, there may be a lot of use to storm, and a pit is a record.

"Record the pits that have been encountered, so that after the encounter can have more network resources to query to reduce the time and tangled problem of troubleshooting"

  

---synchronization problem of storm pit

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.