google protocol buffer的字元編碼問題（c++/java/python）

最後更新：2018-12-05 來源：互聯網

上載者：User

創建阿里雲帳戶，並獲得超過 40 款產品的免費試用版；而企業帳戶則可以享有總值 $1200 的免費試用版。立即註冊！

我上次寫過google protocol buffer的utf-8的問題
，根據protocol作者Kenton Varda

的描述
：

C++ Protocol Buffers
use UTF-8 for all text encoding, regardless of platform. If you want to use some other encoding in your code, you will
have to manually convert between that and UTF-8 when interacting with
Protocol Buffers.

In Java and Python everything is taken care of automatically, since these
languages have built-in unicode support. In Java, protocol buffers uses
String object (which are unicode) to represent strings, and in Python you
can use the "unicode" builtin type for unicode.

翻譯一下
：在c++中用utf-8來作為文本的編碼，平台無關的。如果你想用其他的編碼，你需要手工轉換到utf-8才行。在java和python中，需要小心編碼自動轉換，因為這兩個語言是內建unicode的支援。在java中protocol buffer用String對象（unicode的）來展現字串，在python中，你可以用內建的unicode來展現ucnode字串。

解釋一下
：protocol buffer的proto檔案你要是用string類型，那麼你在c++中用的是utf-8，在java和python中必須用unicode。如果不是，你可以手工去轉換。比如，在python中：

data是utf-8的，但是protocol python版本要求是unicode的，怎麼辦？

cont = MsgContent()

cont.strcont = data.decode('utf-8')#必須從utf-8解碼為unicode

buff = cont.SerializeToString()#序列化之後的字串，就不要管了

同樣的道理，parse完畢後，string的欄位都是unicode的

cont = MsgContent()

cont.ParseFromString(buff)

data = cont.strcont#這是unicode的

data = data.encode('utf-8')#需要什麼編碼，就要從unicode轉換成什麼編碼

最最重要的一點，如果這樣你嫌麻煩，我估計很多人都嫌麻煩，比如資料庫、檔案中、甚至傳輸過程中的編碼都是utf-8，每次都需要轉成unicode再序列化，然後轉換回來之後還得轉成utf-8，你可以用bytes，bytes是不做任何處理的。

最後補充
：我認為Kenton Varda

在這個問題上
，是多慮了，實際上java和python的字串，任何的編碼都是支援的，而不只是unicode，比如在python中，如果檔案中標明了#-*- coding:utf-8 -*-，則代碼中所有手工輸入的字串都是utf-8的，如果讀取檔案、從資料庫中讀取資料、包括從網路上接受的資料，都是utf-8，整個都統一，這樣才是最為方便的，而自動轉換實在是多此一舉。

本文章原先以中文撰寫並發佈於 aliyun.com，亦設英文版本，僅作資訊用途。本網站不對文章的準確性，完整性或可靠性或其任何翻譯作出任何明示或暗示的陳述或保證。如對該文章有任何疑慮或投訴，請傳送電郵至 info-contact@alibabacloud.com 並提供相關疑慮或投訴的詳細說明。職員會於 5 個工作天內與您聯絡，一經驗證之後，即會刪除該侵權內容。

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

google protocol buffer的字元編碼問題（c++/java/python）

聯繫我們

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support