Python Parallel distributed framework: celery

Source: Internet
Author: User
Tags message queue unique id rabbitmq

Celery (celery) is a distributed task queue developed based on Python. It enables task scheduling on distributed machines/processes/threads using the task queue.

Architecture Design

The architecture of the celery consists of three parts, the message middleware, the Task Execution unit (worker), and the task execution result store (task result stores).

    • Message middleware

      Celery itself does not provide messaging services, but it can be easily integrated with the message middleware provided by third parties. Includes, RabbitMQ, Redis, MongoDB (experimental), Amazon SQS (experimental), CouchDB (experimental), SQLAlchemy (experimental) , Django ORM (experimental), IRONMQ

    • Task Execution Unit

      A worker is a unit of task execution provided by celery, and the worker runs concurrently in a distributed System node.

    • Task Result Store

      The task result store is used to store the results of tasks performed by the worker, and celery supports different ways to store the results of the task, including AMQP, redis,memcached, Mongodb,sqlalchemy, Django ORM, Apache Cassandra, Ironcache

In addition, celery supports different methods of concurrency and serialization.

    • Concurrent

      Prefork, Eventlet, gevent, Threads/single threaded

    • Serialization of

      Pickle, json, yaml, msgpack. zlib, bzip2 compression, cryptographic message signing et cetera

Installation and operation

The installation process for celery is somewhat complicated, and the following installation process is based on the installation process of my AWS EC2 Linux version, which may vary depending on the system installation process. Please refer to the official documentation.

First I choose RABBITMQ as the message middleware, so I want to install RABBITMQ first. As the installation prepares, first update yum.

?

1 sudo  yum -y update

RABBITMQ is Erlang-based, so install Erlang first

?

1 2 3 4 5 6 7 8 # Add and enable relevant application repositories: # Note: We are also enabling third party remi package repositories. wget http: //dl .fedoraproject.org /pub/epel/6/x86_64/epel-release-6-8 .noarch.rpm wget http: //rpms .famillecollet.com /enterprise/remi-release-6 .rpm sudo  rpm -Uvh remi-release-6*.rpm epel-release-6*.rpm   # Finally, download and install Erlang: yum  install  -y erlang

Then install RABBITMQ

?

1 2 3 4 5 6 # Download the latest RabbitMQ package using wget: wget   # Add the necessary keys for verification: rpm -- import   # Install the .RPM package using YUM: yum  install  rabbitmq-server-3.2.2-1.noarch.rpm

Start the RABBITMQ service

?

1 rabbitmq-server start

RABBITMQ service is ready, then install celery, assuming you use PIP to manage your Python installation package

?

1 pip  install  Celery

To test whether celery is working, we run one of the simplest tasks, writing tasks.py

?

1 2 3 4 5 6 7 8 from  celery  import  Celery   app  = Celery( ‘tasks‘ , backend = ‘amqp‘ , broker = ‘amqp://[email protected]//‘ )   app.conf.CELERY_RESULT_BACKEND  =  ‘db+sqlite:///results.sqlite‘  @app .task def  add(x, y):      return +  y

Run a worker in the current directory to perform the task of this addition

?

1 celery -A tasks worker --loglevel=info

Where-a parameter represents the name of the celery app. Notice here that I'm using SQLAlchemy as the result store. The corresponding Python package should be installed beforehand.

In the worker log, we'll see this information.

?

1 2 3) 4 5 - ** ---------- [config] - ** ---------- .> app:         tasks:0x1e68d50 - ** ---------- .> transport:   amqp: //guest :**@localhost :5672 // - ** ---------- .> results:     db+sqlite: ///results .sqlite - *** --- * --- .> concurrency: 8 (prefork)

Where we can see that the worker uses prefork to perform concurrency by default and sets the number of concurrency to 8

The following task executes the client code:

?

1 2 3 4 5 6 7 8 9 from  tasks  import add import  time result  =  add.delay( 4 , 4  )   while  not  result.ready():    print "not ready yet"    time.sleep(  5 )   print result.get()

Using Python to execute this client code, on the client side, the results are as follows

?

1 2 not ready    8

Work Log Display

?

1 2 [2015-03-12 02:54:07,973: INFO /MainProcess ] Received task: tasks.add[34c4210f-1bc5-420f-a421-1500361b914f] [2015-03-12 02:54:08,006: INFO /MainProcess ] Task tasks.add[34c4210f-1bc5-420f-a421-1500361b914f] succeeded  in  0.0309705100954s: 8

Here we can see that each task has a unique id,task executed asynchronously on the worker.

It is important to note that if you run an example in an official document, you cannot get results on the client, which is why I use SQLAlchemy to store the results of the task execution. Official example using AMPQ, it is possible for the worker to print out the log when the task's running results are displayed in the worker log, however AMPQ as a message queue, when the message is taken away, the queue is not, so the client is always unable to get the results of the task execution. I don't know why the official documents are blind to such errors.

If you want to know more about celery, please refer to the official documentation


Python Parallel distributed framework: celery

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.