Cannot exit immediately when scheduler cannot get the URL

Source: Internet
Author: User

There is a troublesome problem in the multi-threaded crawl of WebMagic: When the scheduler can't get the URL, you can't quit immediately, you need to wait until the thread that has not been scratched is finished, and no new URL is generated to exit. Before using Thread.Sleep to implement, when the URL is not available, sleep for a period of time to take, to determine that no thread execution, and then exit.

But this approach is never elegant enough. Java has a wait/notify mechanism to solve this synchronization problem. So webmagic 0.4.0 used wait/notify mechanism instead of the previous thread.sleep mechanism. The code is as follows:

    while (! Thread.CurrentThread (). isinterrupted () && stat.get () = = stat_running) {Request request = Scheduler.poll (this);if (Request = =NULL) {if (threadalive.get () = =0 && Exitwhencomplete) {Break }Wait until new URL added Waitnewurl (); }else {Final Request requestfinal = Request; Threadalive.incrementandget (); Executorservice.execute (New Runnable () {@OverridePublicvoid run  () {try {ProcessRequest (requestfinal);} catch (Exception e) {logger. Error ( "download" + requestfinal +  "error", e);} finally {threadalive.decrementandget (); Signalnewurl ();}} }); }}private void waitnewurl () {try {newurllock.lock (); try {newurlcondition.await ();} catch (Interruptedexception e) {}} finally {Newurllock.unlock ();}}         

Here, when the thread is finished, it is called signalNewUrl() to notify the main thread, stop waiting!

After the release of the 0.4.0, a user asked me, why do I sometimes get out of the way? I started to suspect that there might be a thread-safety problem, but I couldn't replicate it.

Thinking about it, there is a possibility of such implementation:

    1. Threadalive>0, perform if (threadAlive.get() == 0 && exitWhenComplete) a check skip, so ready to enter waitNewUrl() ;
    2. At this point the last child thread executes, threadAlive.decrementAndGet(); and signalNewUrl(); executes successively;
    3. At this time the main thread entered waitNewUrl() , the result has been the implementation of the wireless, and no one can notify it, so the thread has been waiting ...

So it seems that adding double-check in lock is OK? But today read http://coolshell.cn/articles/4576.html This article, probably means: out of the question do not rely on guessing! Be sure to reproduce and test!

So decided to manually simulate! Open 10 threads, and mock all the parts, loop 10,000 times to execute, the code is not affixed, address: https://github.com/code4craft/webmagic/blob/master/webmagic-core/src/ Test/java/us/codecraft/webmagic/spidertest.java. Execution, sure enough to the 13th time to get stuck! After the Jstack, it sure is stuck newUrlCondition.await(); here!

Then join double-check:

private void waitNewUrl() { try { newUrlLock.lock(); //double check if (threadAlive.get() == 0 && exitWhenComplete) { return; } try { newUrlCondition.await(); } catch (InterruptedException e) { } } finally { newUrlLock.unlock(); }}

Results executed successfully! Solve this problem!

After this example, I also have a general understanding of why Wait/notify is always first lock . Why is it? Have the opportunity to write an article to summarize it!

It's simple, isn't it? In fact, this article only want to explain one thing: out of the bug do not rely on guessing! Be sure to reproduce and confirm the resolution!

Cannot exit immediately when scheduler cannot get the URL

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.