Retry to solve the exception when the python asynchronous task celery crashes

Source: Internet
Author: User
Tags base64 bind call back redis in python

In python, celery is an asynchronous task framework. In other words, I used to write a project on the alarm platform and also needed task extension to be distributed. At that time, I always felt that celery was not so reliable, therefore, I wrote a distributed task distribution system.


Today, I chatted with my friends about distributed crawlers. This guy said that tasks sometimes crash, but celery's retry mechanism is somewhat interesting. Finally, I read the document, I studied the retry parameters and shared some of my practices with you.

The code is as follows: Copy code
# Xiaorui. cc
@ Celery. task (bind = True, max_retries = 3, default_retry_delay = 1*6)
Def sum (self, num ):
Try:
F = open ('plog', 'A ')
F. write ('retryn ')
F. close ()
Num = num + 1
Return num
Failed T Exception as exc:
Raise self. retry (exc = exc, countdown = 60)



In fact, the most important parameters are also very competent on the official website. Here is an example. Haha ~


Bind = True is enabled

Max_retries is a retry.

Default_retry_delay is the default interval, the time of the attempt


The following code should be understood. Capture exceptions.

Countdown is also time, and the priority of this time is greater than the above default_retry_delay.



At this time, I can see that the setting I just set, after encountering an exception, re-execute it three times.


Note that this exception is thrown by myself. If you do not understand it, refer to the above py. Another point is that celery will sleep its own time. I defined 60 s.



Then, in the test, restart celery, and the task must be running normally. After all, it is placed in the queue. When celery is started, it only retrieves tasks from the queue. When I write data to celery, I only need to ensure that the backend queue is not suspended.

The code is as follows: Copy code
Redis 127.0.0.1: 6379> lrange celery 0-1
1) "{" body ":" Response = "," headers ": {" redelivered ": true}," content-type ": "application/x-python-serialize", "properties": {"body_encoding": "base64", "delivery_info": {"priority": 0, "routing_key ": "celery", "exchange": "celery"}, "delivery_mode": 2, "correlation_id": "a6d12de3-b538-4f31-ab73-611540b696fd", "reply_to": "delimiter", "delivery_tag ": "bd4480dd-d04a-4401-876b-831b30b55f4e"}, "content-encoding": "binary "}"
2) "{" body ":" Response = "," headers ": {" redelivered ": true}," content-type ": "application/x-python-serialize", "properties": {"body_encoding": "base64", "delivery_info": {"priority": 0, "routing_key ": "celery", "exchange": "celery"}, "delivery_mode": 2, "correlation_id": "success", "reply_to": "success", "delivery_tag ": "9fa3c120-0bfd-4453-9539-1465e6e820ff"}, "content-encoding": "binary "}"
Redis 127.0.0.1: 6379>


In fact, I am more concerned with the handling of crashes. For example, our celery has been expanded in a distributed manner. When a node has gone to the task, but suddenly out of memory (oom), sx. I originally thought celery handled this situation using rabbotmq's ack mechanism, but my test results told me that celery's retry mechanism was limited to local play. In fact, even if we don't use his retry modifier, we can write a for loop and filter out exceptions.

My current practice is to call back an interface every time I get a task and do something, then push the tasks I want to do, and then make a tag, say you are working. If you haven't deleted you 10 minutes later, you will be added to the queue.

Of course, the method is still somewhat frustrated, but it has been running online for a period of time, there is no big problem, but in the case of too many tasks, the threads of the monitoring task seem to crash many times. Later, you can use the gevent pool method to monitor whether the rotation event is complete.

If you are a platform-type task release, the page is in the loading... state for a long time, and you are easy to analyze.

But we are not so unlucky ~ As long as the exception is handled properly.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.