Python Parallel distributed framework: celery

Last Update:2015-05-20 Source: Internet

Author: User

Tags message queue unique id rabbitmq

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Celery (celery) is a distributed task queue developed based on Python. It enables task scheduling on distributed machines/processes/threads using the task queue.

Architecture Design

The architecture of the celery consists of three parts, the message middleware, the Task Execution unit (worker), and the task execution result store (task result stores).

Message middleware
Celery itself does not provide messaging services, but it can be easily integrated with the message middleware provided by third parties. Includes, RabbitMQ, Redis, MongoDB (experimental), Amazon SQS (experimental), CouchDB (experimental), SQLAlchemy (experimental) , Django ORM (experimental), IRONMQ
Task Execution Unit
A worker is a unit of task execution provided by celery, and the worker runs concurrently in a distributed System node.
Task Result Store
The task result store is used to store the results of tasks performed by the worker, and celery supports different ways to store the results of the task, including AMQP, redis,memcached, Mongodb,sqlalchemy, Django ORM, Apache Cassandra, Ironcache

In addition, celery supports different methods of concurrency and serialization.

Concurrent
Prefork, Eventlet, gevent, Threads/single threaded
Serialization of
Pickle, json, yaml, msgpack. zlib, bzip2 compression, cryptographic message signing et cetera

Installation and operation

The installation process for celery is somewhat complicated, and the following installation process is based on the installation process of my AWS EC2 Linux version, which may vary depending on the system installation process. Please refer to the official documentation.

First I choose RABBITMQ as the message middleware, so I want to install RABBITMQ first. As the installation prepares, first update yum.

1	`sudo` `yum -y update`

RABBITMQ is Erlang-based, so install Erlang first

1 2 3 4 5 6 7 8 # Add and enable relevant application repositories: # Note: We are also enabling third party remi package repositories. wget http: //dl .fedoraproject.org /pub/epel/6/x86_64/epel-release-6-8 .noarch.rpm wget http: //rpms .famillecollet.com /enterprise/remi-release-6 .rpm sudo rpm -Uvh remi-release-6*.rpm epel-release-6*.rpm # Finally, download and install Erlang: yum install -y erlang

Then install RABBITMQ

1 2 3 4 5 6 # Download the latest RabbitMQ package using wget: wget # Add the necessary keys for verification: rpm -- import # Install the .RPM package using YUM: yum install rabbitmq-server-3.2.2-1.noarch.rpm

Start the RABBITMQ service

1	`rabbitmq-server start`

RABBITMQ service is ready, then install celery, assuming you use PIP to manage your Python installation package

1	`pip` `install` `Celery`

To test whether celery is working, we run one of the simplest tasks, writing tasks.py

1 2 3 4 5 6 7 8 from celery import Celery app = Celery( ‘tasks‘ , backend = ‘amqp‘ , broker = ‘amqp://[email protected]//‘ ) app.conf.CELERY_RESULT_BACKEND = ‘db+sqlite:///results.sqlite‘ @app .task def add(x, y): return x + y

Run a worker in the current directory to perform the task of this addition

1	`celery -A tasks worker --loglevel=info`

Where-a parameter represents the name of the celery app. Notice here that I'm using SQLAlchemy as the result store. The corresponding Python package should be installed beforehand.

In the worker log, we'll see this information.

1 2 3) 4 5 - ** ---------- [config] - ** ---------- .> app: tasks:0x1e68d50 - ** ---------- .> transport: amqp: //guest :**@localhost :5672 // - ** ---------- .> results: db+sqlite: ///results .sqlite - *** --- * --- .> concurrency: 8 (prefork)

Where we can see that the worker uses prefork to perform concurrency by default and sets the number of concurrency to 8

The following task executes the client code:

1 2 3 4 5 6 7 8 9 from tasks import add import time result = add.delay( 4 , 4 ) while not result.ready(): print "not ready yet" time.sleep( 5 ) print result.get()

Using Python to execute this client code, on the client side, the results are as follows

1 2	`not ready` `8`

Work Log Display

1 2	`[2015-03-12 02:54:07,973: INFO` `/MainProcess` `] Received task: tasks.add[34c4210f-1bc5-420f-a421-1500361b914f]` `[2015-03-12 02:54:08,006: INFO` `/MainProcess` `] Task tasks.add[34c4210f-1bc5-420f-a421-1500361b914f] succeeded` `in` `0.0309705100954s: 8`

Here we can see that each task has a unique id,task executed asynchronously on the worker.

It is important to note that if you run an example in an official document, you cannot get results on the client, which is why I use SQLAlchemy to store the results of the task execution. Official example using AMPQ, it is possible for the worker to print out the log when the task's running results are displayed in the worker log, however AMPQ as a message queue, when the message is taken away, the queue is not, so the client is always unable to get the results of the task execution. I don't know why the official documents are blind to such errors.

If you want to know more about celery, please refer to the official documentation

Python Parallel distributed framework: celery

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More