Celery (celery) is a distributed task queue developed based on Python. It enables task scheduling on distributed machines/processes/threads using the task queue.
Architecture Design
The architecture of the celery consists of three parts, the message middleware, the Task Execution unit (worker), and the task execution result store (task result stores).
Message middleware
Celery itself does not provide messaging services, but it can be easily integrated with the message middleware provided by third parties. Includes, RabbitMQ, Redis, MongoDB (experimental), Amazon SQS (experimental), CouchDB (experimental), SQLAlchemy (experimental) , Django ORM (experimental), IRONMQ
Task Execution Unit
A worker is a unit of task execution provided by celery, and the worker runs concurrently in a distributed System node.
Task Result Store
The task result store is used to store the results of tasks performed by the worker, and celery supports different ways to store the results of the task, including AMQP, redis,memcached, Mongodb,sqlalchemy, Django ORM, Apache Cassandra, Ironcache
In addition, celery supports different methods of concurrency and serialization.
Concurrent
Prefork, Eventlet, gevent, Threads/single threaded
Serialization of
Pickle, json, yaml, msgpack. zlib, bzip2 compression, cryptographic message signing et cetera
Installation and operation
The installation process for celery is somewhat complicated, and the following installation process is based on the installation process of my AWS EC2 Linux version, which may vary depending on the system installation process. Please refer to the official documentation.
First I choose RABBITMQ as the message middleware, so I want to install RABBITMQ first. As the installation prepares, first update yum.
?
RABBITMQ is Erlang-based, so install Erlang first
?
1 2 3 4 5 6 7 8 |
# Add and enable relevant application repositories: # Note: We are also enabling third party remi package repositories. wget http: //dl .fedoraproject.org /pub/epel/6/x86_64/epel-release-6-8 .noarch.rpm wget http: //rpms .famillecollet.com /enterprise/remi-release-6 .rpm sudo rpm -Uvh remi-release-6*.rpm epel-release-6*.rpm # Finally, download and install Erlang: yum install -y erlang |
Then install RABBITMQ
?
1 2 3 4 5 6 |
# Download the latest RabbitMQ package using wget: wget # Add the necessary keys for verification: rpm -- import # Install the .RPM package using YUM: yum install rabbitmq-server-3.2.2-1.noarch.rpm |
Start the RABBITMQ service
?
RABBITMQ service is ready, then install celery, assuming you use PIP to manage your Python installation package
?
To test whether celery is working, we run one of the simplest tasks, writing tasks.py
?
1 2 3 4 5 6 7 8 |
from celery import Celery app = Celery( ‘tasks‘ , backend = ‘amqp‘ , broker = ‘amqp://[email protected]//‘ ) app.conf.CELERY_RESULT_BACKEND = ‘db+sqlite:///results.sqlite‘ @app .task def add(x, y): return x + y |
Run a worker in the current directory to perform the task of this addition
?
1 |
celery -A tasks worker --loglevel=info |
Where-a parameter represents the name of the celery app. Notice here that I'm using SQLAlchemy as the result store. The corresponding Python package should be installed beforehand.
In the worker log, we'll see this information.
?
1 2 3) 4 5 |
- ** ---------- [config] - ** ---------- .> app: tasks:0x1e68d50 - ** ---------- .> transport: amqp: //guest :**@localhost :5672 // - ** ---------- .> results: db+sqlite: ///results .sqlite - *** --- * --- .> concurrency: 8 (prefork) |
Where we can see that the worker uses prefork to perform concurrency by default and sets the number of concurrency to 8
The following task executes the client code:
?
1 2 3 4 5 6 7 8 9 |
from tasks import add import time result = add.delay( 4 , 4 ) while not result.ready(): print "not ready yet" time.sleep( 5 ) print result.get() |
Using Python to execute this client code, on the client side, the results are as follows
?
Work Log Display
?
1 2 |
[2015-03-12 02:54:07,973: INFO /MainProcess ] Received task: tasks.add[34c4210f-1bc5-420f-a421-1500361b914f] [2015-03-12 02:54:08,006: INFO /MainProcess ] Task tasks.add[34c4210f-1bc5-420f-a421-1500361b914f] succeeded in 0.0309705100954s: 8 |
Here we can see that each task has a unique id,task executed asynchronously on the worker.
It is important to note that if you run an example in an official document, you cannot get results on the client, which is why I use SQLAlchemy to store the results of the task execution. Official example using AMPQ, it is possible for the worker to print out the log when the task's running results are displayed in the worker log, however AMPQ as a message queue, when the message is taken away, the queue is not, so the client is always unable to get the results of the task execution. I don't know why the official documents are blind to such errors.
If you want to know more about celery, please refer to the official documentation
Python Parallel distributed framework: celery