google protocol buffer好用,但是在python中會有一些小問題。比如不支援utf-8隻支援unicode的字串。在我們的系統中,儲存的和傳輸的都是utf-8,因此在代碼中(c++,java,python)都統一使用utf-8格式。所以就帶來不方便,比如c++中需要從utf-8轉為unicode,python中再解析出來。實際上,可以讓google protocol buffer for python不進行轉碼。
我遇到的問題是:c++讀取檔案(檔案時utf-8的),然後python接受,c++發送的是utf-8的,但是python卻要求unicode的,巨鬱悶。
對於string類型的描述如下,說utf-8和ascii都支援,實際上python版本的卻要求unicode,實在是可笑,由此可見,實際上試一下比說啥都強。
string |
A string must always contain UTF-8 encoded or 7-bit ASCII text. |
string |
String |
下載google protocol buffer的源碼,如http://code.google.com/p/protobuf/downloads/list,下載2.0.3版本。
進入/protobuf-2.0.3/python/google/protobuf/internal目錄下
encode.py
- def AppendString(self, field_number, value):
- """Appends a length-prefixed unicode string, encoded as UTF-8 to our buffer,
- with the length varint-encoded.
- """
- # self.AppendBytes(field_number, value.encode('utf-8'))
- # 將value.encode去掉,我本來就是utf-8,幹嗎還編碼呢
- self.AppendBytes(field_number, value)
decode.py
- def ReadString(self):
- """Reads and returns a length-delimited string."""
- bytes = self.ReadBytes()
- # return unicode(bytes, 'utf-8')
- # 改為下面這句,啥也不幹,是什麼編碼就返回什麼編碼
- return bytes
type_check.py
- class UnicodeValueChecker(object):
- """Checker used for string fields."""
- def CheckValue(self, proposed_value):
- if not isinstance(proposed_value, (str, unicode)):
- message = ('%.1024r has type %s, but expected one of: %s' %
- (proposed_value, type(proposed_value), (str, unicode)))
- raise TypeError(message)
- # If the value is of type 'str' make sure that it is in 7-bit ASCII
- # encoding.
- # 莫名其妙,為何要try convert to unicode,而且是從ascii轉,注釋掉
- # if isinstance(proposed_value, str):
- # try:
- # unicode(proposed_value, 'ascii')
- # except UnicodeDecodeError:
- # raise ValueError('%.1024r isn/'t in 7-bit ASCII encoding.'
- # % (proposed_value))
改了之後,重新安裝一下,它就不會給你轉編碼了,你想傳什麼編碼就什麼編碼。我用過c++版本的,好像不會處理編碼問題。不過好像proto的資料類型中bytes也能滿足要求,bytes它是不做任何處理的。