Celery+rabbitmq+mysql+flower

Source: Internet
Author: User
Tags error handling rabbitmq

  1. Flower

    1. Http://docs.celeryproject.org/en/latest/getting-started/index.html

    2. Http://flower.readthedocs.org/en/latest/config.html

    3. https://denibertovic.com/posts/celery-best-practices/

    4. Http://daimin.github.io/posts/celery-shi-yong.html

    5. http://ju.outofmemory.cn/entry/221884

    6. Https://linfan1.gitbooks.io/kubernetes-chinese-docs/content/098-Distributed%20Task%20Queue.html

    7. Http://gangtao.is-programmer.com/posts/83922.html

    8. Https://linfan1.gitbooks.io/kubernetes-chinese-docs/content/098-Distributed%20Task%20Queue.html

    9. http://www.vimer.cn/2014/07/%E5%88%86%E5%B8%83%E5%BC%8F%E6%B6%88%E6%81%AF%E7%B3%BB%E7%BB%9F%E5%B0%9D%E8%AF% 95rabbitmq-celery-redis.html

    10. Http://flower-docs-cn.readthedocs.org/zh/latest/config.html

    11. http://dongweiming.github.io/blog/archives/how-to-use-celery/

Celery Best Practices

As a celery use heavy users, see celery best practices This article, can not help chrysanthemum a tight. Simply translate it, and also join our project in the celery of actual combat experience.

Usually when using Django, you may need to perform some long background tasks, and maybe you need to use some sort of task queue, then celery will be a good choice.

When celery is used as a task queue for many projects, the author accumulates some best practices, such as how to use celery in the right way, and some features that celery provide but are not yet fully utilized.

1, do not use the database as your AMQP Broker

Database is not designed to be used in the AMQP broker, in the production environment, it is very likely at some point when the machine (PS, when the drop this I think any system can not guarantee improper!!! )。

The authors wonder why many people use databases as brokers primarily because they already have a database that provides data storage for Web apps, so it's easy to use them directly and set up a celery broker. No additional components (such as RABBITMQ) need to be installed.

Suppose you have the following scenario: You have 4 back-end workers to get and process the tasks put into the database, which means you have 4 processes in order to get the latest tasks, you need to poll the database frequently, and maybe each worker has a number of its own concurrent threads doing the same thing.

One day, you find that because of too many tasks, 4 workers are not enough, the speed of processing tasks is much behind the speed of production tasks, so you keep increasing the number of workers. Suddenly, your database has been slow to respond because of a large number of process polling tasks, and disk IO has been at a high peak, and your web apps are beginning to be affected. All of this is because workers is constantly DDoS the database.

And when you use a suitable amqp (such as RABBITMQ), this will not happen, take RABBITMQ as an example, first, it puts the task queue into memory, you do not need to access the hard disk. Second, consumers (the worker above) does not need to be polled frequently because RABBITMQ can push new tasks to consumers. Of course, if RABBITMQ really has a problem, it won't affect your web app at least.

This is why the author says that the database is not used as a broker, and many places provide a compiled RABBITMQ image that you can use directly, such as these.

I am very much in favour of this. Our system uses a lot of celery to handle asynchronous tasks, about an average of millions of asynchronous tasks a day, previously we used MySQL, and then there will always be a task processing delay too serious problems, even if the worker is not used. So we used Redis for a lot of performance improvements. As for why we use MySQL very slowly, we did not go to the bottom of the question, maybe also really have a DDoS problem.

2, use more queue (do not use default only)

Celery is very easy to set up, usually it uses the default queue to hold tasks (unless you display a specified other queue). The usual wording is as follows:

@app. Task () def My_taska (A, B, c): print ("Doing something here ...") @app. Task () def my_taskb (x, y): print ("Doing someth ing here ... ")

Both of these tasks will be executed in the same queue, which is actually very appealing, because you only need to use a decorator to implement an asynchronous task. The author's concern is that Taska and TASKB may not be exactly two different things, or one might be more important than the other, so why put them in a basket? (eggs can't be put in a basket, right!) Maybe taskb is not really important, but too much, so that the important Taska can not be processed quickly by the worker. Adding workers also solves this problem, as Taska and TASKB are still executed within a queue.

3, using workers with priority

In order to solve the problem in the 2, we need to let Taska in one queue Q1, and taskb in another queue Q2 execution. Also specify x workers to handle the tasks of the queue Q1, and then use the other workers to handle the tasks Q2 the queue. In this way, TASKB is able to get enough workers to handle, while some priority workers can handle taska well without long waits.

Manually define the queue first

Celery_queues = (Queue (' Default ', Exchange (' Default '), routing_key= ' Default '), Queue (' For_task_a ', Exchange (' For_ta Sk_a '), routing_key= ' for_task_a '), Queue (' For_task_b ', Exchange (' For_task_b '), routing_key= ' For_task_b '),)

Then define routes to decide which queue to use for different tasks.

Celery_routes = {' My_taska ': {' queue ': ' for_task_a ', ' routing_key ': ' For_task_a '}, ' My_taskb ': {' queue ': ' For_task_ B ', ' routing_key ': ' For_task_b '},}

Finally, start a different workers for each task

Celery WORKER-E-L info-n workera-q for_task_acelery worker-e-l info-n workerb-q For_task_b

In our project, will involve a large number of file conversion problems, there are a large number of less than 1MB file conversion, but also a small amount of nearly 20mb of file conversion, small file conversion priority is the highest, and do not occupy a lot of time, but the conversion of large files is time-consuming. If you put the conversion task in a queue, it is very likely that the conversion of large files, resulting in too much time-consuming caused by the small file conversion delay problem.

So we set 3 priority queues according to the file size, and each queue has a different workers, which solves the problem of our file conversion well.

4, using the celery error handling mechanism

Most tasks do not use error handling, and if the task fails, it fails. In some cases this is good, but the author sees most of the failure is to call the third-party API and then a network error, or the resource is not available these errors, and for these errors, the simplest way is to retry, perhaps the third-party API temporary service or network problems, may be immediately good, So why not try to focus on it?

@app. Task (Bind=true, default_retry_delay=300, max_retries=5) def my_task_a (): Try:print ("Doing stuff here ...") Except Somenetworkexception as E:print ("Maybe do some clenup here ....") Self.retry (E)

The author likes to define for each task how long it will take to retry, and the maximum number of retries. Of course, there are more detailed parameter settings, look at the document to go.

For error handling, we did not join the retry mechanism because of the special use scenario, such as a file conversion failure, no matter how many retries would fail.

5, using Flower

Flower is a very powerful tool for monitoring celery's tasks and works.

We don't use this stuff very much either, because most of the time we are directly connected to Redis to see celery related situations. It seems like a pretty dumb push. No, especially the data stored in the Redis celery is not easy to take out.

6, don't worry too much about task exit status

A task status is a success or failure message at the end of the task, which is useful in some statistical situations. However, we need to know that the status of the task exit is not the result of the task execution, and some of the results of the task execution are usually written to the database (such as updating a user's friend list) because it has an impact on the program.

Most of the projects that the author has seen put the end of the task into SQLite or its own database, but it's really necessary to have this, and it might affect your Web service, so the author usually sets Celery_ignore_result = True to discard.

For us, because it is an asynchronous task, we know that the state after the completion of the task is really useless, so decisively discarded.

7, do not pass the Database/orm object to the task

This is actually not to pass the database object (for example, a user instance) to the task, because the data after the serialization is already outdated data. Therefore, it is best to pass a user ID directly, and then get it from the database in real time when the task executes.

For this, we are the same, to the task only pass the relevant ID data, such as file conversion, we will only pass the file ID, and other information obtained from the file we are directly from the database through the ID obtained.

At last

The back is our own feelings, the above mentioned by the author of the use of celery, really can be regarded as a very good practice, at least now our celery has not been too big problems, of course, small pits are still some. As for RABBITMQ, this thing we are really useless, how the effect does not know, at least better than MySQL to use it.

Finally, attach the author of a celery talk https://denibertovic.com/talks/celery-best-practices/.



This article is from the "Mr_computer" blog, make sure to keep this source http://caochun.blog.51cto.com/4497308/1747382

Celery+rabbitmq+mysql+flower

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.