Python coding Comprehension

Source: Internet
Author: User
From: http://www.unixresources.net/linux/clf/python/archive/00/00/42/73/427317.html

Combined with the two replies, the running results are as follows:
>>> A = 'China'
>>> B = Unicode (A, 'cp936 ')
>>> B
U'u4e2du56fd'
>>> C = B. encode ('utf-8 ')
>>> C
'Xe4xb8xadxe5x9bxbd'
>>> D = B. encode ('cp936 ')
>>> D
'Xd6xd0xb9xfa'

In addition to the unihan database, it is known that B contains the decimal value, equivalent to the UTF-16 encoding, C contains UTF-8 encoding, D contains gb2312 encoding. The representation of Unicode in Python (that is, the result of conversion using the Unicode () function) is the unique decimal value, equivalent to the UTF-16. Of course, UTF and Unicode are not the same thing, and the latter is the conversion format of the former. I understand it, right?

In addition, can "cp936" be used in Linux?

For more information, see:
Can I introduce the Unicode () function of Python and the encode () method of string objects? They primarily serve as well as acceptable encoding.

Thank you very much. Clarified an important concept.

P.s "nonsense XML" contains a chapter dedicated to Unicode. It makes the relationship between UTF-8, UTF-16, and Unicode very clear. Based on the two comments in this post, I think everyone will learn a lot like me.
======================================

I tried again. For the Unicode () function,

It mainly uses two parameters: the original string, encoding

For example, in '', the UTF-8 encoding is: xe4xb8xad,
In this case, Unicode ('xe4xb8xad', 'utf-8') is the Unicode of the 'character in', which is u4e2d '.
The 'utf-8' parameter indicates the encoding of the previous string. Unicode () is used to interpret the first parameter according to the encoding and return its Unicode form.

However, the format of U "XXX" is not understood, and str. encode () is not understood ......
==================================
The internal unicode encoding you're talking about is using UTF-16 should be correct.

I usually convert a string to Unicode using Unicode (STR, encoding)
To convert a Unicode character to another encoding, use unistr. encode (encoding)

U "XXX" is a representation of the internal encoding of characters in Python.

Cp936 is not used in Linux. This is explained on my blog: http://www.donews.net/limodou/archive/2004/08/13/67432.aspx

Therefore, we need to use a GBK encoding module or a CJK module.

======================================
Code:

# Coding: UTF-8
# Specify the file encoding as utf8
Import OS
# The following is the sample code, which may not be able to run. It can be written at will without compilation or running.
# Example of XP platform as an example, because the Linux platform encoding (UTF-8) and Windows platform (GBK) is not the same.
# Assume that there are many Chinese name files under drive D.
Filelist = OS. listdir (R "D: \") # The Chinese in the returned list is GBK encoded. You can view the CMD window attributes.
For Path in filelist:
If OS. Path. isdir (PATH): continue
Fp = open (path. decode ("GBK"), 'rb') # If path is used here. decode ("UTF-8") will throw an exception because the Dir command of wind returns GBK Encoding
Print Len (FP. Read ())
FP. Close ()
Filepath = r "D: \ 文.doc" # if this document exists, remember to include Chinese
Fp = open (filepath. Decode ('utf-8'), "rb") # Here, the utf8 parameter is used for decoding, because there is a coding: UTF-8 sentence in the file header.
Print Len (FP. Read ())
FP. Close ()
Path2 = u "D: \ Chinese file .doc" # if there is a U in front of it, this variable is unicode encoded and does not need to be decoded.
Fp = open (path2, 'rb ')
Print Len (FP. Read ())
FP. Close ()

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.