Google protocol buffer uses utf-8 in Python

Source: Internet
Author: User
Google protocol buffer works, but there are some small problems in Python. For example, do not support utf-8 strings that support Unicode only. In our system, both the stored and the transmitted are utf-8, so the utf-8 format is uniformly used in code (C++,JAVA,PYTHON). So it brings inconvenience, such as C + + needs from Utf-8 to Unicode,python in the interpretation. In fact, you can have Google protocol buffer for Python without transcoding.

I encountered a problem: C + + read the file (file Utf-8), and then Python accepted that C + + sent is utf-8, but Python requires Unicode, the giant depressed.

The description of string type is as follows, said Utf-8 and ASCII support, in fact Python version of the request Unicode, it is ridiculous, it can be seen, actually try more than say anything better.

String A string must always contain UTF-8 encoded or 7-bit ASCII text. String String


Download Google protocol buffer source code, such as Http://code.google.com/p/protobuf/downloads/list, download the 2.0.3 version.
Enter the/protobuf-2.0.3/python/google/protobuf/internal directory
encode.py
def appendstring (self, Field_number, value): "" "Appends a length-prefixed Unicode string, encoded as UTF-8 to our buff     Er, with the length varint-encoded. "" "# Self. Appendbytes (Field_number, Value.encode (' Utf-8 ')) # will be Value.encode removed, I was utf-8, why also code it self. Appendbytes (Field_number, value)
decode.py
def ReadString (self): "" "reads and returns a length-delimited string." "" bytes = self. Readbytes () # Return Unicode (bytes, ' utf-8 ') # change to the following sentence, do nothing, what code is returned to what code returns bytes type_check.py
Class unicodevaluechecker (object):    "" "Checker used for string fields" "" "   def checkvalue (self, proposed_value):     if not isinstance (proposed_value,  (Str, unicode)):       message =  ('%.1024r  has type %s, but expected one of: %s '  %                    (Proposed_value, type ( Proposed_value),  (Str, unicode))       raise typeerror (message)     # If the value is of type  ' str '  make sure  that it is in 7-bit ascii     # encoding.     #  inexplicably, why try convert to Unicode, and go from ASCII, comment out #    if  Isinstance (Proposed_value, STR): #      try: #        unicode ( proposed_value,  ' ASCII ') #      except unicodedecodeerror: #         raise valueerror ('%.1024r isn/' t in 7-bit ascii  encoding. ' #                          %  (Proposed_value))
After the change, reinstall it, it will not give you the code, you want to pass what encoding on what code. I've used the C + + version, and I don't seem to be dealing with coding problems. However, it seems that proto's data type bytes also satisfies the requirements, bytes it does not do any processing.




Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.