Python character set parsing, troubleshooting Windows ftpclient download Chinese name file garbled

Source: Internet
Author: User

The Chinese encoding in Python has always been an extremely big problem, often throwing out the code conversion exceptions, what exactly is str and Unicode in Python? In Python, referring to Unicode, generally refers to Unicode objects, such as ' haha ' Unicode object is U ' \u54c8\u54c8 ', and STR, is a byte array, this byte array represents the Unicode object encoding (can be utf-8 , GBK, cp936, GB2312) are stored in the format. Here it is just a stream of words, no other meaning, if you want to make this byte stream display content meaningful, you must use the correct encoding format, decoding display

For example:

>>> A = u "Hello" >>> A_utf8 = A.encode ("Utf-8") >>> print A_utf8 raccoon ã ソ>>> A_GBK = A.encode ("G BK ") >>> print a_gbk Hello >>> a_utf8 ' \xe4\xbd\xa0\xe5\xa5\xbd ' >>> a_gbk ' \xc4\xe3\xba\xc3 '

For the Unicode object "Hello" encoding, encoded into a utf-8 encoding, A_utf8 is a byte array, storing is ' \xe4\xbd\xa0\xe5\xa5\xbd ', but this is just a byte array, You cannot output to Hello through the print statement. Because the print statement is the implementation of the output is going to send the operating system, the operating system according to the system encoding the input byte stream encoding, which explains why the utf-8 format string "Hello", the output is "ã ソ", because ' \ Xe4\xbd\xa0\xe5\xa5\xbd ' with GB2312 to explain, its display is "raccoon ã ソ". STR records a byte array, just some encoding of the storage format, as to the output to a file or print out what format, completely depends on the decoding of its encoding to what it looks like. Here's a little bit more on print: When a Unicode object is passed to print, the Unicode object is internally converted and converted to the default encoding of the cost (possibly this way)


Decode and encode

The representation of a string inside Python is Unicode encoding, and in the case of encoding conversion, it is often necessary to use Unicode as the intermediate encoding, that is, decoding the other encoded string (decode) into Unicode, and then from Unicode encoding (encode) to another encoding. Example: Str1.decode (' gb2312 '), which represents the conversion of GB2312 encoded string str1 to Unicode

Str2.encode (' gb2312 '), which represents the conversion of a Unicode-encoded string str2 to gb2312 encoding.

Transcoding must first understand, the string str is what encoding, and then decode into Unicode, and then encode into other encodings, in UTF8 file, the string is UTF8 encoding, if it is in GBK file, it is encoded as GBK. In this case, to encode the conversion, you need to first convert it to Unicode encoding using the Decode method, and then use the Encode method to convert it to another encoding. Typically, you create a code file by using the system default encoding when you do not specify a specific encoding method.

If a string is already Unicode, then decoding will be an error, so it is common to determine whether it is encoded as Unicode:
Isinstance (S, Unicode) #用来判断是否为unicode


Example: Troubleshooting Windows python ftpclient download Chinese file name error

Def downloadfile ():     remotepath = os.path.join (Remotepath, Zname). Encode (' Utf-8 ')     localpath = creatdir ()     localpath  = os.path.join (Localpath, zname). Encode ("GBK")     print  "Start connecting to FTP server ..."     ftp = ftpconnect ()     ftp.set_debuglevel (2)   #打开调试      #print  ftp.getwelcome ()   #显示ftp服务器欢迎信息     bufsize =  1024  #设置缓冲块大小     try:        print   "Start receiving files on server ..."         fp = open (Localpath.decode (' GBK ',  ' WB ')   #以写模式在本地打开文件         ftp.retrbinary (' retr  ')  + remotepath,fp.write,bufsize)   #接收服务器上文件并写入本地文件          logging.debug ("Read remote address is%s" % remotepath.decode ("UTF8"). Encode ("GBK"))          Logging.debug ("%s Download success path is:  %s"  % (zname, localpath))          print  "%s Download success path:  %s"  % (Zname, localpath)          fp.close ()     except Exception, e:         print e        logging.debug ("%s download failed to close file, exit FTP server"  %zname)         print  "Download Failed"          os.remove (LocalPath)     finally:         ftp.quit ()   #退出ftp服务器



This article is from the "left-handed" blog, make sure to keep this source http://mofeihu.blog.51cto.com/1825994/1827563

Python character set parsing, troubleshooting Windows ftpclient download Chinese name file garbled

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.