In python, celery is an asynchronous task framework. In other words, I used to write a project on the alarm platform and also needed task extension to be distributed. At that time, I always felt that celery was not so reliable, therefore, I wrote a distributed task distribution system.
Today, I chatted with my friends about distributed crawlers. This guy said that tasks sometimes crash, but celery's retry mechanism is somewhat interesting. Finally, I read the document, I studied the retry parameters and shared some of my practices with you.
The code is as follows: |
Copy code |
# Xiaorui. cc @ Celery. task (bind = True, max_retries = 3, default_retry_delay = 1*6) Def sum (self, num ): Try: F = open ('plog', 'A ') F. write ('retryn ') F. close () Num = num + 1 Return num Failed T Exception as exc: Raise self. retry (exc = exc, countdown = 60) |
In fact, the most important parameters are also very competent on the official website. Here is an example. Haha ~
Bind = True is enabled
Max_retries is a retry.
Default_retry_delay is the default interval, the time of the attempt
The following code should be understood. Capture exceptions.
Countdown is also time, and the priority of this time is greater than the above default_retry_delay.
At this time, I can see that the setting I just set, after encountering an exception, re-execute it three times.
Note that this exception is thrown by myself. If you do not understand it, refer to the above py. Another point is that celery will sleep its own time. I defined 60 s.
Then, in the test, restart celery, and the task must be running normally. After all, it is placed in the queue. When celery is started, it only retrieves tasks from the queue. When I write data to celery, I only need to ensure that the backend queue is not suspended.
The code is as follows: |
Copy code |
Redis 127.0.0.1: 6379> lrange celery 0-1 1) "{" body ":" Response = "," headers ": {" redelivered ": true}," content-type ": "application/x-python-serialize", "properties": {"body_encoding": "base64", "delivery_info": {"priority": 0, "routing_key ": "celery", "exchange": "celery"}, "delivery_mode": 2, "correlation_id": "a6d12de3-b538-4f31-ab73-611540b696fd", "reply_to": "delimiter", "delivery_tag ": "bd4480dd-d04a-4401-876b-831b30b55f4e"}, "content-encoding": "binary "}" 2) "{" body ":" Response = "," headers ": {" redelivered ": true}," content-type ": "application/x-python-serialize", "properties": {"body_encoding": "base64", "delivery_info": {"priority": 0, "routing_key ": "celery", "exchange": "celery"}, "delivery_mode": 2, "correlation_id": "success", "reply_to": "success", "delivery_tag ": "9fa3c120-0bfd-4453-9539-1465e6e820ff"}, "content-encoding": "binary "}" Redis 127.0.0.1: 6379> |
In fact, I am more concerned with the handling of crashes. For example, our celery has been expanded in a distributed manner. When a node has gone to the task, but suddenly out of memory (oom), sx. I originally thought celery handled this situation using rabbotmq's ack mechanism, but my test results told me that celery's retry mechanism was limited to local play. In fact, even if we don't use his retry modifier, we can write a for loop and filter out exceptions.
My current practice is to call back an interface every time I get a task and do something, then push the tasks I want to do, and then make a tag, say you are working. If you haven't deleted you 10 minutes later, you will be added to the queue.
Of course, the method is still somewhat frustrated, but it has been running online for a period of time, there is no big problem, but in the case of too many tasks, the threads of the monitoring task seem to crash many times. Later, you can use the gevent pool method to monitor whether the rotation event is complete.
If you are a platform-type task release, the page is in the loading... state for a long time, and you are easy to analyze.
But we are not so unlucky ~ As long as the exception is handled properly.