Using the Protocol Buffers Reference guide in Python

Last Update:2018-07-23 Source: Internet

Author: User

Tags reflection in python

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Protocol Buffer Basics:python

This tutorial provides a basic introductory tutorial for Python programmers using protocol buffers. By creating a simple sample application, it shows you how to

* Define the format of the message in a. proto file.

* Use protocol buffer compiler.

* Read and write the message using the Python protocol buffer API.

This is not a comprehensive guide to using protocol buffers in Python. For more detailed reference information, please read the Protocol Buffer Language guide,python API Reference,python generated Code Guide and encoding Reference.

Why use Protocol buffers? The example we will use is a very simple "address book" application that can read and write people's contacts from a file. Everyone in the Address book has a name, an ID, an e-mail address, and a contact number.

How you serialize and retrieve structural data in such a way. Here are some ways to solve this problem:

* Use Python processing. This is the default method, because this method is used directly to the language, but it is not conducive to schema evolution, and it is not conducive to you to share data to C + + or Java applications.

* You can invent a special way to encode a data item into a string, such as encoding 4 int as "12:3:23:67". This is a simple and flexible approach, although it needs to write coded and parsed code at once and add a small cost of operation to parsing. This method is best suited for very simple data encoding.

* Serialization of data with XML. This approach is very appealing because XML is readable and there are many libraries to support various languages. This is a good choice if you want to share data with other applications/projects. However, XML is also a name-consuming space, and coding/decoding can cause huge loss of application performance. Plus, manipulating an XML DOM tree is usually more complex than manipulating a field in a class.

Protocol buffers will be flexible, efficient and automated to solve this problem accurately. With protocol buffers, you can write a. proto file to describe the data structure you want to store. Therefore, the protocol buffer compiler creates a class that implements automatic encoding and parsing of protocol buffer data through an efficient binary format. This generated class provides the getter and Setter fields to form a protocol buffer, and reads and writes details as a unit of protocol buffer. More importantly, protocol buffer supports the idea of extending the format in the future so that the code can still read data encoded in the old format.

where to find the Example code in the source code directory, the folder "Examples" contains all routines. Download it here.

defining Your Protocol Format in order to create your "address Book" Application, you will use a. proto file. This is a very simple. proto file definition: You can add a message to the data structure you want to serialize, and then specify a name and a type for each field in the message. Here are the. proto files that you want to define for your message, Addressbook.proto.

Package tutorial;

Message person {
  Required String name = 1;
  Required Int32 id = 2;
  Optional String email = 3;

  Enum Phonetype {
    MOBILE = 0;
    home = 1;
    WORK = 2;
  }

  Message PhoneNumber {
    Required String number = 1;
    Optional Phonetype type = 2 [default = home];
  }

  Repeated PhoneNumber phone = 4;
}

Message AddressBook {
  repeated person person = 1;
}

As you can see, the syntax is much like C + + and Java. So let's look at each part of the file and see what they do.

The. Proto file begins with a package declaration to help prevent naming conflicts in different projects. In Python, packages are usually determined by the directory structure, so this package, defined by your. proto file, has no effect in generating your code. However, you should insist on declaring this statement, in order to prevent the Protocol in the namespace of the buffers, as in other non-Python languages.

And then, that's the message you defined. A message is a collection that contains a set of type fields. There are a number of simple standard data types that can be used in type fields, including Bool,int32,float,double and string. You can also use more structures to define your message, such as using other message types as a Type field--in the example above, Personmessage contains Phonenumbermessage, There are also addressbookmessage containing personmessage. You can also define message embedding in other message--as you have seen, the PhoneNumber type is defined in the person type. You can also define an enumeration type if you want one of your fields to have a list of preset types--Here you can enumerate your phone numbers as mobile,home or work.

That "= 1", "= 2" marks the identification of each element as the unique label for the field in the binary encoding. The label requires that the number 1-15 is less than one byte encoded for a higher number, so, as an optimized scenario, you can decide to use these tags for commonly used and reusable elements, leaving 16 or the highest number to the infrequently used and selectable elements. The elements in each repeating field require that the tag number be encoded, so repeating fields are especially good for using this optimization.

Each field must be decorated with the following modifiers:

*required: Be sure to provide a value to this field, otherwise this message will be considered "uninitialized". Serializing a column with no initialization message will cause an exception. Parsing a message that is not initialized will fail. In addition, this required field behaves more like a optional field.

*optional: This field can be set or not set. If an optional field does not have a value set, the default value is used. In short, you can specify your own default values, as we did with the phone number type in the example. In addition, the system defaults to this: 0 to the integer type, null to string type, False to Boolean type. For embedded message, the default value is usually either "default instance" or "prototype" for a message that does not have a field set. The invocation accessor obtains the value of an optional (or required) field, and those fields that usually explicitly give a value always return the default value for that field.

*repeated: This field will repeat some numbers (including 0) several times. Duplicate values are stored sequentially in protocol buffer. Duplicate fields are considered to be dynamic arrays.

Required is Forever you should be very careful to mark the field as Required. If at some point you want to stop writing or send a required field, change the indeterminate field to an optional field--the old reader will assume that no this field message is incomplete and may inadvertently reject or delete it. You should consider writing an application-specific custom validation routine for your buffer. Some of the conclusions from Google are that the use of required is more harmful than good; they prefer to use only optional and repeated. However, this view is not common.

You'll find a guide to writing. proto files--including all possible types of fields--in Protocol Buffer Language Guide. Do not look for a device similar to class inheritance, although--protocol buffers do not.

compiling Your Protocol buffers Now that you have your own. proto file, the next thing you need to do is generate what you need to read and write AddressBook (also with person and PhoneNumber) The class of the message. To complete this work, you need to run the protocol buffer compiler PROTOC to compile your. Proto file:

1. If you do not have the compiler installed, download the package, follow the instructions in the Readme.

2. Run the compiler now, specify the source directory (your application source directory-If you do not provide this directory, the default is the current directory), the target directory (the directory where your application compiles the generated code, usually with $src_dir), and the directory path to your. proto file. In this case, you can

protoc-i= $SRC _dir--python_out= $DST _dir $SRC _dir/addressbook.proto

Because you want to generate Python classes, you have to use the--python_out option--and there are similar options to support other languages.

This addressbook_pb2.py will be generated in the target directory you specify.

The Protocol buffer API does not allow you to generate the Java or C + + Protocol buffer code, the Python Protocol buffer compiler does not directly generate the code you can access data. Instead (as you can see, if you look at addressbook_pd2.py) it will generate the specified descriptor for your message, enum, field, and some unintelligible empty classes, one of the message types:

Class person (message. Message):
  __metaclass__ = Reflection. Generatedprotocolmessagetype

  class PhoneNumber (message. Message):
    __metaclass__ = Reflection. Generatedprotocolmessagetype
    descriptor = _person_phonenumber
  descriptor = _person

class AddressBook ( Message. Message):
  __metaclass__ = Reflection. Generatedprotocolmessagetype
  descriptor = _addressbook

There are some important statements in each class: __metaclass__ = Reflection. Generatedprotocolmessagetype. Although the details of how metaclasses works in Python are beyond the scope of this tutorial, you can think of them as templates for creating classes. At load time, Generatedprotocolmessagetype Metaclass creates all Python methods that you need to use for the message type and adds classes related to these methods with the specified descriptor. You can then use these classes in your code.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More