Google protocol buffer works, but there are some small problems in Python. For example, do not support utf-8 strings that support Unicode only. In our system, both the stored and the transmitted are utf-8, so the utf-8 format is uniformly used in code (C++,JAVA,PYTHON). So it brings inconvenience, such as C + + needs from Utf-8 to Unicode,python in the interpretation. In fact, you can have Google protocol buffer for Python without transcoding.
I encountered a problem: C + + read the file (file Utf-8), and then Python accepted that C + + sent is utf-8, but Python requires Unicode, the giant depressed.
The description of string type is as follows, said Utf-8 and ASCII support, in fact Python version of the request Unicode, it is ridiculous, it can be seen, actually try more than say anything better.
String |
A string must always contain UTF-8 encoded or 7-bit ASCII text. |
String |
String |
Download Google protocol buffer source code, such as Http://code.google.com/p/protobuf/downloads/list, download the 2.0.3 version.
Enter the/protobuf-2.0.3/python/google/protobuf/internal directory
encode.py
def appendstring (self, Field_number, value): "" "Appends a length-prefixed Unicode string, encoded as UTF-8 to our buff Er, with the length varint-encoded. "" "# Self. Appendbytes (Field_number, Value.encode (' Utf-8 ')) # will be Value.encode removed, I was utf-8, why also code it self. Appendbytes (Field_number, value)
decode.py
def ReadString (self): "" "reads and returns a length-delimited string." "" bytes = self. Readbytes () # Return Unicode (bytes, ' utf-8 ') # change to the following sentence, do nothing, what code is returned to what code returns bytes type_check.py
Class unicodevaluechecker (object): "" "Checker used for string fields" "" " def checkvalue (self, proposed_value): if not isinstance (proposed_value, (Str, unicode)): message = ('%.1024r has type %s, but expected one of: %s ' % (Proposed_value, type ( Proposed_value), (Str, unicode)) raise typeerror (message) # If the value is of type ' str ' make sure that it is in 7-bit ascii # encoding. # inexplicably, why try convert to Unicode, and go from ASCII, comment out # if Isinstance (Proposed_value, STR): # try: # unicode ( proposed_value, ' ASCII ') # except unicodedecodeerror: # raise valueerror ('%.1024r isn/' t in 7-bit ascii encoding. ' # % (Proposed_value))
After the change, reinstall it, it will not give you the code, you want to pass what encoding on what code. I've used the C + + version, and I don't seem to be dealing with coding problems. However, it seems that proto's data type bytes also satisfies the requirements, bytes it does not do any processing.