Python tips: 儲存Unicode字元到文字文件

最後更新：2018-12-08 來源：互聯網

上載者：User

創建阿里雲帳戶，並獲得超過 40 款產品的免費試用版；而企業帳戶則可以享有總值 $1200 的免費試用版。立即註冊！

昨天在儲存一些中文字元到文字文件時，發現一個很奇怪的現象。先看看代碼：

#coding=utf-8import osdef write_use_open(filepath):    try:        file = open(filepath, 'wb')        try:            content = '中華人民共和國abcd \r\nee ?!>??@@@!！！！！？？？￥@#%@%#xx學校ada\r\n'            print file.encoding            print file.newlines            print file.mode            print file.closed            print content            file.write(content)        finally:            file.close()            print file.closed    except IOError, e:        print e    if __name__ == '__main__':    filepath = os.path.join(os.getcwd(), 'file.txt')    write_use_open(filepath)

開始我是IDLE編寫的，並直接按F5運行，沒發現問題，檔案也被正確地儲存，檔案的編碼類別型也是utf-8.

可是我用命令列運行，卻發現顯示出現亂碼了，然後在開啟檔案發現檔案被正確儲存了，編碼還是utf-8：

我想問題是命令列不能自動識別字元編碼吧，因為IDLE顯示是正確的，它支援utf-8。

於是我修改了代碼，在字串前加了'u'，表明content是unicode:

content = u'中華人民共和國abcd \r\nee ?!>??@@@!！！！！？？？￥@#%@%#xx學校ada\r\n'

可是運行發現，命令列是正確顯示了，但是卻出現異常：

很明顯，content裡包含了非ASCII碼字元，肯定不能使用ASCII來進行編碼的，write方法是預設使用ascii來編碼儲存的。

很容易就可以想到，在儲存之前，先對unicode字元進行編碼，我選擇utf-8

#coding=utf-8import osdef write_use_open(filepath):    try:        file = open(filepath, 'wb')        try:            content = u'中華人民共和國abcd \r\nee ?!>??@@@!！！！！？？？￥@#%@%#xx學校ada\r\n'            print file.encoding            print file.newlines            print file.mode            print file.closed            print content            print unicode.encode(content, 'utf-8')            file.write(unicode.encode(content, 'utf-8'))        finally:            file.close()            print file.closed    except IOError, e:        print e    if __name__ == '__main__':    filepath = os.path.join(os.getcwd(), 'file.txt')    write_use_open(filepath)

看看運行結果：

OK了開啟文檔也是正確的。

讀取檔案又怎樣？同樣道理，只是這次不是編碼了，而解碼：

def read_use_open(filepath):    try:        file = open(filepath, 'rb')        try:            content = file.read()            content_decode = unicode(content, 'utf-8')            print 'original text'            print content            print 'decode using utf-8'            print content_decode        finally:            file.close()    except IOError, e:        print e    if __name__ == '__main__':    filepath = os.path.join(os.getcwd(), 'file.txt')    write_use_open(filepath)    print 'read file ---------------------------'    read_use_open(filepath)

為什麼不直接在open的時候就解碼呢？呵呵，可以啊，可以使用codecs的open方法

import codecsdef read_use_codecs_open(filepath):    try:        file = codecs.open(filepath, 'rb', 'utf-8')        try:            print 'using codecs.open'            content = file.read()            print content        finally:            file.close()    except IOError, e:        print e

好了，希望對你有用。

本文參考：Unicode HOWTO

Technorati 標籤: Python,unicode,utf-8,ascii,codecs,open

本文章原先以中文撰寫並發佈於 aliyun.com，亦設英文版本，僅作資訊用途。本網站不對文章的準確性，完整性或可靠性或其任何翻譯作出任何明示或暗示的陳述或保證。如對該文章有任何疑慮或投訴，請傳送電郵至 info-contact@alibabacloud.com 並提供相關疑慮或投訴的詳細說明。職員會於 5 個工作天內與您聯絡，一經驗證之後，即會刪除該侵權內容。

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More