Google protocol buffer uses utf-8 in Python

Last Update:2018-07-24 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Google protocol buffer works, but there are some small problems in Python. For example, do not support utf-8 strings that support Unicode only. In our system, both the stored and the transmitted are utf-8, so the utf-8 format is uniformly used in code (C++,JAVA,PYTHON). So it brings inconvenience, such as C + + needs from Utf-8 to Unicode,python in the interpretation. In fact, you can have Google protocol buffer for Python without transcoding.

I encountered a problem: C + + read the file (file Utf-8), and then Python accepted that C + + sent is utf-8, but Python requires Unicode, the giant depressed.

The description of string type is as follows, said Utf-8 and ASCII support, in fact Python version of the request Unicode, it is ridiculous, it can be seen, actually try more than say anything better.

String

A string must always contain UTF-8 encoded or 7-bit ASCII text.

String

Download Google protocol buffer source code, such as Http://code.google.com/p/protobuf/downloads/list, download the 2.0.3 version.
Enter the/protobuf-2.0.3/python/google/protobuf/internal directory
encode.py
def appendstring (self, Field_number, value): "" "Appends a length-prefixed Unicode string, encoded as UTF-8 to our buff Er, with the length varint-encoded. "" "# Self. Appendbytes (Field_number, Value.encode (' Utf-8 ')) # will be Value.encode removed, I was utf-8, why also code it self. Appendbytes (Field_number, value)
decode.py
def ReadString (self): "" "reads and returns a length-delimited string." "" bytes = self. Readbytes () # Return Unicode (bytes, ' utf-8 ') # change to the following sentence, do nothing, what code is returned to what code returns bytes type_check.py
Class unicodevaluechecker (object): "" "Checker used for string fields" "" " def checkvalue (self, proposed_value): if not isinstance (proposed_value, (Str, unicode)): message = ('%.1024r has type %s, but expected one of: %s ' % (Proposed_value, type ( Proposed_value), (Str, unicode)) raise typeerror (message) # If the value is of type ' str ' make sure that it is in 7-bit ascii # encoding. # inexplicably, why try convert to Unicode, and go from ASCII, comment out # if Isinstance (Proposed_value,&NBSP;STR): # try: # unicode ( proposed_value, ' ASCII ') # except unicodedecodeerror: # raise valueerror ('%.1024r isn/' t in 7-bit ascii encoding. ' # % (Proposed_value))
After the change, reinstall it, it will not give you the code, you want to pass what encoding on what code. I've used the C + + version, and I don't seem to be dealing with coding problems. However, it seems that proto's data type bytes also satisfies the requirements, bytes it does not do any processing.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Google protocol buffer uses utf-8 in Python

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Google protocol buffer uses utf-8 in Python

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support