Brief analysis of PROTOBUF (to be continued)

Source: Internet
Author: User
Tags bool
Brief analysis of Protobuf

First, let's look at what Protobuf is and what it is for.
This official document has made it clear that:

Protocol buffers is a flexible, efficient, automated mechanism for serializing structured data–think XML, but smaller, Faster, and simpler. Define how do you want your data to be structured once and then you can use special generated source code to easily write an D read your structured data to and from a variety of data streams and using a variety of languages. You can even update your data structure without breaking deployed programs that is compiled against the "old" format.

Protobuf is a flexible, efficient, automated, structured data serialization mechanism, just like XML. However, Protobuf is smaller, faster and easier to use. Once you define how you want the data to be structured, you can easily use the source code of the various languages generated to read (write) your structured data from (to) different streams of data. Even if you update your data structure, you will not break a deployed program that was compiled using the old format. basic data types and modifiers for protobuf

data types supported by Protobuf

The value of the
data type related information
Double
float
int32 uses variable-length encoding, is inefficient when coding negative numbers, and if the field may be negative, you should use Sint32
Int64 use variable-length encoding, which is inefficient when coding negative numbers, and if the field may be negative, you should Using Sint64
uint32 using variable length encoding
uint64 using variable length encoding
sint32 uses variable length encoding, which is more efficient than int32 when coding negative
sint64 using variable length encoding, encoding negative numbers Higher efficiency than int64
fixed32 always occupies 4 bytes, which is more efficient than uint32 when the value of the field is greater than 228
fixed6 4 always occupies 8 bytes and is more efficient than uint64 when the value of the field is greater than 256
sfixed32 always takes 4 bytes
Sfixed64 always takes 8 bytes
bool
string string field must be UTF-8 encoded or 7bit ASCII-encoded text
bytes can contain any sequence of bytes

modifier
-Required: A well-formed message must contain exactly one of the fields.
-Optional: A well-formed message must contain no more than one of the fields.
-Repeated: The field can be repeated any number of times (including 0 times), and the order of multiple values is also recorded. PROTOBUF API Analysis

First, use the. proto file in the official tutorial to generate the. pb.h and. pb.cc files.

. Proto

Syntax = "Proto2";

Package tutorial;

Message person {
  Required String name = 1;
  Required Int32 id = 2;
  Optional String email = 3;

  Enum Phonetype {
    MOBILE = 0;
    HOME = 1;
    Work = 2;
  }

  Message PhoneNumber {
    Required String number = 1;
    Optional Phonetype type = 2 [default = HOME];
  }

  Repeated PhoneNumber phones = 4;
}

Message AddressBook {
  repeated person people = 1;
}
CopyFrom

The generated person class provides multiple APIs to analyze the CopyFrom interface.
It is defined as follows:

void Person::copyfrom (const:: Google::p rotobuf::message& from) {
    if (&from = = this) return;
    Clear ();
    Mergefrom (from);
}

void Person::copyfrom (const person& from) {
    if (&from = = this) return;
    Clear ();
    Mergefrom (from);
}

As you can see, when the From and this are all pointing to the same object, CopyFrom will return directly, otherwise it will call clear to set all the fields to null, and then use Mergefrom to merge the values from the from to this, and this CopyFrom has been parsed. mergefrom (i)

So, let's just drop clear and see how mergefrom is implemented.

void Person::mergefrom (const person& from) {
    ...
    Phones_. Mergefrom (From.phones_);
    Cached_has_bits = from._has_bits_[0];
    if (Cached_has_bits & 7u) {
        if (Cached_has_bits & 0x00000001u) {
            set_has_name ();
            Name_. Assignwithdefault (&::google::p rotobuf::internal::getemptystringalreadyinited (), from.name_);
        }

        if (Cached_has_bits & 0x00000002u) {
            set_has_email ();
            Email_. Assignwithdefault (&::google::p rotobuf::internal::getemptystringalreadyinited (), from.email_);
        }

        if (Cached_has_bits & 0x00000004u) {
            id_ = from.id_;
        }
        _has_bits_[0] |= cached_has_bits;
    }
}

The first thing you see is a merge of the phones fields.

Phones_. Mergefrom (From.phones_);

Because the phones field is also a message, the merge operation is done directly by the phones object itself.

And then we see all the cached_has_bits, that is, the from._has_bits_[0] or operation. What is that from._has_bits_?

You can see it in the header file.

Class Person:public:: Google::p rotobuf::message {
    ...
Private::
    : google::p rotobuf::internal::hasbits<1> _has_bits_;
}

Originally, _has_bits_ is a Hasbits<1> object, that Hasbits is what. View the Protobuf source file to see the definition of hasbits as follows:

template<size_t doublewords>
class Hasbits {public
:
    ...:

    : google::p rotobuf::uint32& operator[] (int index) google_protobuf_attribute_always_inline {
    return has_bits_[index];
    }

    Const:: Google::p rotobuf::uint32& operator[] (int index) const Google_protobuf_attrbute_always_inline {
        return has_bits_[index];    
    }

    BOOL operator== (const hasbits<doublewords>& RHS) Const {
        return memcmp (Has_bits_, Rhs.has_bits_, sizeof (HAS_BITS_)) = 0;
    }

    BOOL Operator!= (const hasbits<doublewords>& RHS) Const {
        return! *this = = RHS);
    }

    BOOL empty () const;

Private:::
        google:protobuf::uint32 has_bits_[doublewords];
};

As you can see from the definition, hasbits is just an encapsulation of the UInt32 array. Therefore, you can simply consider _has_bits_ as a uint32 array, and the length of the array is 1. Summary

Speaking of this, everyone so scattered to see a bunch of definitions, may have been a little dizzy, first a little rest to make a summary and forecast.

To summarize the previous content, first of all, we used the. Proto to produce the header file as well as the source file. When reading these two files, it is found that the CopyFrom interface is implemented by invoking the clear interface and the Mergefrom interface. The Mergefrom interface has a large number of operations on the _has_bits_, in order to further understand the implementation of Mergefrom, we must first understand what the role of _has_bits_.

So, what is _has_bits_ used to do, according to my prediction,_has_bits as an UInt32 array, is used to mark whether a field has set a value. Each field occupies 1bit as the flag bit. is actually a bitmap.

Of course, this is just my speculation, so let's go over and verify it. Mergefrom (b)

And look back at mergefrom.

if (Cached_has_bits & 7u) {...}

According to our above guess, the binary representation of 7 is 0111, then this is to determine whether any one of the 3 fields has been set.

if (Cached_has_bits & 0x00000001u) {
    set_has_name ();
    ...
}

Here to determine whether the bit of the lowest bit is 1, is called Set_has_name (), then look at the definition of Set_has_name:

inline void Person::set_has_name () {
    _has_bits[0] |= 0x00000001u;
}

It seems that the conjecture is probably right, in fact, the message also has an API is Has_, we look at the definition of Has_name:

inline bool Person::has_name () const {
    return (_has_bits_[0] & 0x00000001u)! = 0;
}

The truth comes out,Protobuf will automatically find a bit for each field to record whether he has been set, the bit is recorded in the Has_bits array, which can be the bit, can be viewed by looking at the generated header file .

Now, it can be understood that theMergefrom first time to determine whether there is a number of consecutive bit is any one to confirm that there is a field has been set, if any, only to determine the number of consecutive bits, only the fields that have been set will be merged into this.

(not to be continued)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.