Use protocol buffers's C language to expand an example of speeding up Python programs _python

Source: Internet
Author: User
Tags serialization

Protocol buffers (a data description language similar to XML) in the latest version 2.3, the Protoc-py_out command generates only native Python code. Although PB (Protocol buffers) can generate fast parsing and serialization code for the C + + language, this approach does not apply to Python, and manually generated packaged code requires a great maintenance effort. In a discussion group, this is a common feature requirement that generates native Python code with a higher priority due to a prerequisite client component-appengine (AppEngine by the team introduction name).

Luckily, the localized code in the PB 2.4 version has been nominated and can be downloaded in the SVN branch, so you can use the fast PB for a while. (We have been using the r352 version for some time and haven't encountered any problems.) PB team has been unwilling to easily specify any release date, under my threat, Kenton Varda mentioned the date is tentatively scheduled at the beginning of 2011.


I have not seen this document anywhere else and hope it will help others.

What to do to get it up

After installing the new PB Library and using Protoc--py_out= ... After you have rebuilt your PB, you need to set the environment variable protocol_buffers_python_implementation=cpp before running your PYTHON program, so that you can select C + + or PB default Python implementations.

That's it! This can at least be used in generic C + + code for the PB Run-time libraries that can dynamically transform/serialize messages. (Note that we haven't generated any C + + code yet.)

How fast can it be? I have written a simple program to gain performance in our application's sense of ascension:

Nruns = 1000nwarmups = 100xs = ... # your protobufsdef ser (): Return [X.serializetostring () to X in Xs]def parse (YS): for Y in YS:PB. Email (). Parsefromstring (y)
 
t = Timeit. Timer (Lambda:none)
T.timeit (nwarmups) print ' NoOp: ', T.timeit (nruns)/nruns
 
t = Timeit. Timer (Ser)
T.timeit (nwarmups) print ' Ser: ', T.timeit (nruns)/Nruns/len (xs)
 
ys = Ser ()
t = Timeit. Timer (Lambda:parse (YS))
T.timeit (nwarmups) print ' parse: ', T.timeit (nruns)/Nruns/len (XS) print ' msg size: ', SUM ( Len (y) for y in Ys)/Len (YS)

In seconds, this program gives the following time results on my desktop:

$ python sandbox/pbbench.py out.ini
ser:0.000434461673101
parse:0.000602062404156
msg size:10730
 
$ protocol_buffers_python_implementation=cpp \
> PYTHON sandbox/pbbench.py out.ini
ser: 2.86788344383e-05
parse:7.63910810153e-05
msg size:10730

This shows a 15 and a 8 increase in the speed of serialization and transformation respectively. Not bad! But it can be faster.

What to do to make it faster

Now we're actually just making a C + + implementation for your PB, and we've never used run-time reflection. First, add a C extension to your Python project, by modifying the following setup.py:

Setup (
  ...
  ..) Ext_modules=[extension (' PODPB ',
sources=[' cpp/podpb.c ', ' cpp/main.pb.cc '], libraries=[' protobuf ')],
  ...
  )

Use the Protoc--cpp_out=cpp to build the MAIN.PB.C and create a PODPB.C to set up an empty Python C module as shown below:

#include <Python.h>
 
static pymethoddef podmethods[] = {
 {null, NULL, 0, NULL}    /* Sentinel/};
 
Pymodinit_func
initpodpb (void)
{
 pyobject *m;
 
 m = Py_initmodule ("PODPB", podmethods); if (m = = NULL) return  ;

Running the Python setup.py build command now builds everything. As long as the C module (here is PODPB) is imported into your project, the PB Run-time Library will be implemented automatically using C + +.

Now we have 68 times times X and 13 times times the speed of ascension respectively. Ho ho.

$ pythonpath=build/lib.linux-x86_64-2.6/: $PYTHONPATH \
> protocol_buffers_python_implementation=cpp \
> Python sandbox/pbbench.py out.ini
ser:6.39575719833e-06
parse:4.55250144005e-05
msg size:10730


I published this article in many places, the big things completely forgot its existence. Meanwhile Connex.io and Greplin released their native Python implementations, CYPB and FAST-PYTHON-PB. CYPB is advertised in the PB mailing list and can be run, but it still needs to be elevated to the available state. FAST-PYTHON-PB currently supports only string int32, Int64 double-precision floating-point and child message members. I don't know anything else except these projects. You can also check out my orginal thread PB mailing list to learn more about this.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.