The Chinese encoding in Python has always been an extremely big problem, often throwing out the code conversion exceptions, what exactly is str and Unicode in Python? In Python, referring to Unicode, generally refers to Unicode objects, such as ' haha ' Unicode object is U ' \u54c8\u54c8 ', and STR, is a byte array, this byte array represents the Unicode object encoding (can be utf-8 , GBK, cp936, GB2312) are stored in the format. Here it is just a stream of words, no other meaning, if you want to make this byte stream display content meaningful, you must use the correct encoding format, decoding display
For example:
>>> A = u "Hello" >>> A_utf8 = A.encode ("Utf-8") >>> print A_utf8 raccoon ã ソ>>> A_GBK = A.encode ("G BK ") >>> print a_gbk Hello >>> a_utf8 ' \xe4\xbd\xa0\xe5\xa5\xbd ' >>> a_gbk ' \xc4\xe3\xba\xc3 '
For the Unicode object "Hello" encoding, encoded into a utf-8 encoding, A_utf8 is a byte array, storing is ' \xe4\xbd\xa0\xe5\xa5\xbd ', but this is just a byte array, You cannot output to Hello through the print statement. Because the print statement is the implementation of the output is going to send the operating system, the operating system according to the system encoding the input byte stream encoding, which explains why the utf-8 format string "Hello", the output is "ã ソ", because ' \ Xe4\xbd\xa0\xe5\xa5\xbd ' with GB2312 to explain, its display is "raccoon ã ソ". STR records a byte array, just some encoding of the storage format, as to the output to a file or print out what format, completely depends on the decoding of its encoding to what it looks like. Here's a little bit more on print: When a Unicode object is passed to print, the Unicode object is internally converted and converted to the default encoding of the cost (possibly this way)
Decode and encode
The representation of a string inside Python is Unicode encoding, and in the case of encoding conversion, it is often necessary to use Unicode as the intermediate encoding, that is, decoding the other encoded string (decode) into Unicode, and then from Unicode encoding (encode) to another encoding. Example: Str1.decode (' gb2312 '), which represents the conversion of GB2312 encoded string str1 to Unicode
Str2.encode (' gb2312 '), which represents the conversion of a Unicode-encoded string str2 to gb2312 encoding.
Transcoding must first understand, the string str is what encoding, and then decode into Unicode, and then encode into other encodings, in UTF8 file, the string is UTF8 encoding, if it is in GBK file, it is encoded as GBK. In this case, to encode the conversion, you need to first convert it to Unicode encoding using the Decode method, and then use the Encode method to convert it to another encoding. Typically, you create a code file by using the system default encoding when you do not specify a specific encoding method.
If a string is already Unicode, then decoding will be an error, so it is common to determine whether it is encoded as Unicode:
Isinstance (S, Unicode) #用来判断是否为unicode
Example: Troubleshooting Windows python ftpclient download Chinese file name error
Def downloadfile (): remotepath = os.path.join (Remotepath, Zname). Encode (' Utf-8 ') localpath = creatdir () localpath = os.path.join (Localpath, zname). Encode ("GBK") print "Start connecting to FTP server ..." ftp = ftpconnect () ftp.set_debuglevel (2) #打开调试 #print ftp.getwelcome () #显示ftp服务器欢迎信息 bufsize = 1024 #设置缓冲块大小 try: print "Start receiving files on server ..." fp = open (Localpath.decode (' GBK ', ' WB ') #以写模式在本地打开文件 ftp.retrbinary (' retr ') + remotepath,fp.write,bufsize) #接收服务器上文件并写入本地文件 logging.debug ("Read remote address is%s" % remotepath.decode ("UTF8"). Encode ("GBK")) Logging.debug ("%s Download success path is: %s" % (zname, localpath)) print "%s Download success path: %s" % (Zname, localpath) fp.close () except Exception, e: print e logging.debug ("%s download failed to close file, exit FTP server" %zname) print "Download Failed" os.remove (LocalPath) finally: ftp.quit () #退出ftp服务器
This article is from the "left-handed" blog, make sure to keep this source http://mofeihu.blog.51cto.com/1825994/1827563
Python character set parsing, troubleshooting Windows ftpclient download Chinese name file garbled