Google protocol buffer is easy to use, but there are some minor problems in Python. For example, UTF-8 only supports Unicode strings. In our system, both the storage and transmission are UTF-8. Therefore, in code (C ++, Java, and Python), the UTF-8 format is used in a unified manner. Therefore, it is inconvenient. For example, C ++ needs to convert from UTF-8 to Unicode and parse it in Python. In fact, Google protocol buffer for python is not transcoded.
The problem I encountered was: C ++ reads files (UTF-8 in the file), and then Python accepts them. c ++ sends UTF-8, but Python requires Unicode, very depressing.
The description of the string type is as follows: UTF-8 and ASCII are supported. In fact, the Python version requires Unicode, which is really ridiculous. It can be seen that it is better than anything else.
| String |
A string must always contain UTF-8 encoded or 7-bit ASCII text. |
String |
String |
Download the source code of Google protocol buffer, such as http://code.google.com/p/protobuf/downloads/list, and download 2.0.3133.
Go to/protobuf-2.0.3/Python/Google/protobuf/internal directory
Encode. py
- Def appendstring (self, field_number, value ):
- "Appends a length-prefixed Unicode string, encoded as UTF-8 to our buffer,
- With the length varint-encoded.
- """
- # Self. appendbytes (field_number, value. encode ('utf-8 '))
- # Remove value. encode. I used to be UTF-8. Why do I still encode it?
- Self. appendbytes (field_number, value)
Decode. py
- Def readstring (Self ):
- "Reads and returns a length-delimited string ."""
- Bytes = self. readbytes ()
- # Return Unicode (bytes, 'utf-8 ')
- # Change it to the following sentence and do nothing. If it is an encoding, it returns an encoding.
- Return bytes
Type_check.py
- Class unicodevaluechecker (object ):
- "Checker used for string fields ."""
- Def checkvalue (self, proposed_value ):
- If not isinstance (proposed_value, (STR, Unicode )):
- Message = ('%. 1024r has type % s, but expected one of: % s' %
- (Proposed_value, type (proposed_value), (STR, Unicode )))
- Raise typeerror (Message)
- # If the value is of Type 'str' make sure that it is in 7-bit ASCII
- # Encoding.
- # Inexplicably, why try convert to Unicode and convert it from ASCII to comment out
- # If isinstance (proposed_value, STR ):
- # Try:
- # Unicode (proposed_value, 'ascii ')
- # Couldn't unicodedecodeerror:
- # Raise valueerror ('%. 1024r isn/' t in 7-bit ASCII encoding .'
- # % (Proposed_value ))
After the change, re-install it and it will not convert the code to you. The encoding you want to transfer is what encoding. I have used C ++, and it seems that I will not handle the encoding problem. However, it seems that bytes in the proto data type can also meet the requirements. bytes does not process any data.