轉：python的str，unicode對象的encode和decode方法

最後更新：2018-12-07 來源：互聯網

上載者：User

創建阿里雲帳戶，並獲得超過 40 款產品的免費試用版；而企業帳戶則可以享有總值 $1200 的免費試用版。立即註冊！

python的str，unicode對象的encode和decode方法
python中的str對象其實就是"8-bit string" ，位元組字串，本質上類似java中的byte[]。
而python中的unicode對象應該才是等同於java中的String對象，或本質上是java的char[]。
對於

Python代碼
1.s= "你好"
2.u=u"你好"

1. s.decode方法和u.encode方法是最常用的，
簡單說來就是，python內部表示字串用unicode（其實python內部的表示和真實的unicode是有點差別的，對我們幾乎透明，可不考慮），和人互動的時候用str對象。
s.decode -------->將s解碼成unicode，參數指定的是s本來的編碼方式。這個和unicode(s,encodename)是一樣的。
u.encode -------->將unicode編碼成str對象，參數指定使用的編碼方式。
助記：decode to unicode from parameter
encode to parameter from unicode
只有decode方法和unicode建構函式可以得到unicode對象。
上述最常見的用途是比如這樣的情境，我們在python源檔案中指定使用編碼cp936，
# coding=cp936或#-*- coding:cp936 -*-或#coding:cp936的方式（不寫預設是ascii編碼）
這樣在源檔案中的str對象就是cp936編碼的，我們要把這個字串傳給一個需要儲存成其他編碼的地方（比如xml的utf-8,excel需要的utf-16）
通常這麼寫：
strobj.decode("cp936").encode("utf-16")

You typically encode a unicode string whenever you need to use it for IO, for instance transfer it over the network, or save it to a disk file.
To convert a string of bytes to a unicode string is known as decoding. Use unicode('...', encoding) or '...'.decode(encoding).
You typically decode a string of bytes whenever you receive string data from the network or from a disk file.
2.
第一條已經寫了不少，因為是最常用到的，基本不用怎麼解釋。我重點想說的是這第二條。
似乎有了unicode對象的encode方法和str的decode方法就足夠了。奇怪的是，unicode也有decode，而str也有
encode，到底這兩個是幹什麼的。
用處1
str本身已經是編碼過的了，如果再encode很難想到有什麼用（通常會出錯的）
先解釋下這個
str.encode(e) is the same as unicode(str).encode(e).
This is useful since code that expects Unicode strings should also work when it is passed
ASCII-encoded 8-bit strings(from Guido van Rossum)
python之父的這段話大概意思是說encode方法本來是被unicode調的，但如果不小心被作為str對象的方法調，並且這個str對象正好
是ascii編碼的（ascii這一段和unicode是一樣的），也應該讓他成功。這就是str.encode方法的一個用處（我覺得這個基本等於沒用）
類似地，把光用ascii組成的unicode再decode一回是一樣的道理，因為好像幾乎任何編碼裡ascii都原樣沒變。因此這樣的操作等於沒做。
u"abc".decode("gb2312")和u"abc"是相等的。

用處2
非字元的編碼集non-character-encoding-codecs，這些只在python中定義，離開python就沒意義（這個來自python的官方文檔）
並且也不是人類用的語言，呵呵。
比如

Python代碼
1.'\n' .encode( 'hex' )== '0a'
2.u'\n' .encode( 'hex' )== '0a'
3.'0a' .decode( 'hex' )== '\n'
4.u'0a' .decode( 'hex' )== '\n'

可見名為hex的編碼可以講字元表示（當然了，必須是ascii內的）和十六進位表示之間轉換
另外還有很多好玩的，比如：base64通俗的講是號稱防君子不防小人的給郵件的編碼，gzip大概是指壓縮吧（這是我猜的），rot13迴轉13等，不知者google之
關於這些，官方有個詳細的表格，在http://docs.python.org/library/codecs.html中的Standard Encodings一節中，前一個表格是基於字元的編碼，第二個表格
就是這裡的非字元的編碼。關於這些特殊編碼，官方一句說明：
For the codecs listed below, the result in the “encoding” direction is always a byte string.
The result of the “decoding” direction is listed as operand type in the table.
encode的結果一定是一個byte的str，而decode的結果在表中operand一列。

參考
Converting Between Unicode and Plain Strings 在Unicode和一般字元串之間轉換
http://wiki.woodpecker.org.cn/moin/PyCkBk-3-18
what’s the difference between encode/decode? (python 2.x)
http://stackoverflow.com/questions/447107/whats-the-difference-between-encode-decode-python-2-x
http://docs.python.org/library/codecs.html

編碼聲明的作用
請參考http://www.python.org/dev/peps/pep-0263/
聲明源檔案中將出現非ascii編碼；
在進階的IDE中，IDE會將你的檔案格式儲存成你指定編碼格式。
決定源碼中類似於u'哈'這類聲明的將'哈'解碼成unicode所用的編碼格式，也是一個比較容易讓人迷惑的地方。
（java不需要聲明的原因在於：java中預設是本地編碼而py中預設是ascii，搞得python更易出錯，
並且，java編譯的時候還有個指定編碼的參數encoding）

檔案的編碼格式決定了在該源檔案中聲明的字串的編碼格式，例如：

Python代碼
1.str = '哈哈'
2.print repr(str)

a.如果檔案格式為utf-8，則str的值為：'\xe5\x93\x88\xe5\x93\x88'（哈哈的utf-8編碼）
b.如果檔案格式為gbk，則str的值為：'\xb9\xfe\xb9\xfe'（哈哈的gbk編碼）

我的理解：檔案編碼格式儲存後沒有地方指明，只有靠聰明或笨的編輯器，編譯器去猜。而聲名就更精確一些。
讓兩者一致了總不會錯。

其實好多其他語言或應用中也是類似的decode和encode概念，比如在java中String的涉及的編碼轉換及jdk中的工具native2ascii，
好像javascript也有這個，記不清楚了。

本文來自CSDN部落格，轉載請標明出處：http://blog.csdn.net/suofiya2008/archive/2011/05/12/6415162.aspx

本文章原先以中文撰寫並發佈於 aliyun.com，亦設英文版本，僅作資訊用途。本網站不對文章的準確性，完整性或可靠性或其任何翻譯作出任何明示或暗示的陳述或保證。如對該文章有任何疑慮或投訴，請傳送電郵至 info-contact@alibabacloud.com 並提供相關疑慮或投訴的詳細說明。職員會於 5 個工作天內與您聯絡，一經驗證之後，即會刪除該侵權內容。

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More