As a heavy-duty Celery user, the article Celery Best Practices couldn't help but get stuck. Simply translate it and add the celery practical experience in our project.
For more information about Celery, see Celery.
When using Django, you may need to execute some background tasks for a long time. Maybe you may need to use some sorted task queues, so Celery will be a good choice.
After using Celery as a task queue for many projects, the author has accumulated some best practices, such as how to use Celery in an appropriate way, and some features provided by Celery but not fully used.
1. Do not use the database as your AMQP Broker
The database is not designed to be used by AMQP broker. in the production environment, it is likely to be a machine (PS, I don't think any system can be used properly !!!).
The author guessed why many people use databases as brokers mainly because they already have a database to provide data storage for web apps, so they simply use it directly, it is easy to set the broker to Celery without installing other components (such as RabbitMQ ).
Assume that you have four backend workers to obtain and process the tasks placed in the database. This means that you have four processes to obtain the latest tasks, you need to poll the database frequently. Maybe each worker has multiple concurrent threads at the same time.
One day, you find that because too many tasks are generated and four workers are not enough, the processing speed of the tasks is greatly behind the production task speed, so you keep increasing the number of workers. Suddenly, your database is slow to respond due to a large number of process polling tasks, disk IO is always in the peak status, and your web applications are also affected. All of this is because workers keeps conducting DDOS attacks on the database.
When you use a suitable AMQP (such as RabbitMQ), this will not happen. Taking RabbitMQ as an example, it first places the task queue in the memory, you do not need to access the hard disk. Second, the consumers (that is, the preceding worker) does not need to be polling frequently because RabbitMQ can push new tasks to the consumers. Of course, if RabbitMQ encounters a problem, at least it will not affect your web application.
This is why the author said that the database is not used as a broker. In addition, compiled RabbitMQ images are provided in many places and can be directly used, for example.
I agree with this. Our system uses Celery to process a large number of asynchronous tasks, with an average of several millions of asynchronous tasks a day. mysql we used previously, there will always be a problem where the task processing latency is too serious, it does not work even if worker is added. So we used redis, and the performance was improved a lot. As for the slow usage of mysql, we didn't go into details, and maybe there was a DDOS problem.
2. Use more queue (do not use the default value only)
Celery is very easy to set. Generally, it uses the default queue to store tasks (unless you specify other queue ). The statement is as follows:
@app.task()def my_taskA(a, b, c): print("doing something here...")@app.task()def my_taskB(x, y): print("doing something here...")
Both tasks are executed in the same queue, which is very attractive because you only need to use a decorator to implement an asynchronous task. I am concerned that taskA and taskB may be totally different things, or one may be more important than the other. Why should we put them in one basket? (No eggs can be placed in a basket, right !) Maybe task KB is not very important, but the amount is too large, so that the important task Ka cannot be processed quickly by the worker. Adding workers cannot solve this problem, because taskA and taskB are still executed in a queue.
3. Use workers with priority
To solve the problem in 2, we need to make taskA in one queue Q1, while taskB is executed in another queue Q2. At the same time, specify x workers to process the task of queue Q1, and then use other workers to process the task of queue Q2. In this way, taskB can obtain enough workers for processing, and some priority workers can well process taskA without waiting for a long time.
First, manually define the queue
CELERY_QUEUES = ( Queue('default', Exchange('default'), routing_key='default'), Queue('for_task_A', Exchange('for_task_A'), routing_key='for_task_A'), Queue('for_task_B', Exchange('for_task_B'), routing_key='for_task_B'),)
Then define routes to determine which queue the task is going
CELERY_ROUTES = { 'my_taskA': {'queue': 'for_task_A', 'routing_key': 'for_task_A'}, 'my_taskB': {'queue': 'for_task_B', 'routing_key': 'for_task_B'},}
Start different workers for each task.celery worker -E -l INFO -n workerA -Q for_task_A celery worker -E -l INFO -n workerB -Q for_task_B
In our project, there will be a large number of file conversion problems, there are a large number of file conversion less than 1 MB, there are also a small number of file conversion of nearly 20 mb, the priority of small file conversion is the highest, at the same time, it does not take a lot of time, but it takes a lot of time to convert large files. If you put the conversion task in a queue, it is very likely that the conversion delay of small files may be caused by the time consumption of large files.
Therefore, we set three priority queues according to the file size, and set different workers for each queue, which effectively solves the problem of file conversion.
4. Use the Celery Error Handling Mechanism
Most tasks do not use error handling. If the task fails, it fails. In some cases, this is good, but most of the failed tasks I have seen are to call third-party APIs and then encounter network errors or resource unavailability errors. For these errors, the simplest way is to try again. It may be a problem with a third-party API temporary service or a network, but it may be okay right away. Why not try again?
@app.task(bind=True, default_retry_delay=300, max_retries=5)def my_task_A(): try: print("doing stuff here...") except SomeNetworkException as e: print("maybe do some clenup here....") self.retry(e)
The author prefers to define the maximum retry time and the maximum number of Retries for each task. Of course, there are more detailed parameter settings. Read the documentation for yourself.
For error handling, we use special scenarios. For example, if a file fails to be converted, the retry will fail no matter how many retries, so the Retry Mechanism is not added.
5. Use Flower
Flower is a very powerful tool used to monitor celery tasks and works.
We didn't use this because most of the time we directly connect to redis to check the celery information. It seems silly. No, especially the data that celery stores in redis cannot be retrieved conveniently.
6. Don't worry too much about the task exit status.
The status of a task indicates whether the task is successful or failed at the end of the task. It may be useful in some statistical scenarios. However, we need to know that the exit status of the task is not the result of the task execution. Some results of the task execution will affect the program, it is usually written into the database (for example, updating a user's friend list ).
Most of the projects I have seen have stored the state of task termination in sqlite or your own database. But is it really necessary to save the state? Maybe it may affect your web service, so I usually setCELERY_IGNORE_RESULT = True
Discard.
For us, because it is an asynchronous task, we know that the status after the task is completed is useless, so we discard it decisively.
7. Do not pass the Database/ORM object to the task.
In fact, this is not to pass the Database object (such as a user's instance) to the task, because the serialized data may have expired. Therefore, it is best to pass a user id directly and obtain it from the database in real time during task execution.
This is also true for the task. Only id data is transmitted to the task. For example, when a file is converted, only the file id is transmitted, other file information is obtained directly from the database using this id.
Last
We have our own feelings. The use of Celery mentioned above can be a good practice. At least we haven't had a big problem with Celery, of course there are still some traps. As for RabbitMQ, we have never used this. I don't know how it works. It's better than mysql at least.
Finally, we will attach the author's Celery Talk https://denibertovic.com/talks/celery-best-practices /.