String in Python and Unicode (a) collection __python

Source: Internet
Author: User
Tags numeric value

The first thing to figure out is that in Python, string object and Unicode object are two different types.

String object is a sequence consisting of characters, and Unicode object is a sequence of Unicode code units.

Character in string are encoded in a variety of ways, such as Single-byte ASCII, Double-byte GB2312, and so on, such as UTF-8. Obviously to interpret string, it is necessary to know which encoding the character in the string is, and then to proceed.

What is the Unicode code unit again? A Unicode code unit is a 16-bit or 32-bit numeric value, each of which represents a Unicode symbol. In Python, the 16-bit Unicode corresponds to the UCS2 encoding. 32-bit corresponds to the UCS4 encoding. Does it feel like the character code in string is no different. Anyway, I have this impression in my head: in Python, UCS2 or UCS4 encoded, we call it Unicode object, and the other encodings we call string.

As for whether Unicode in Python is UCS2 or UCS4, it can be specified at compile time. For example, under Linux, to use UCS2 to do Unicode encoding, you can
#./configure--ENABLE-UNICODE=UCS2
# make
# make Install
The downloaded version of Windows precompilation is generally ucs2. To find out if a Python runtime environment is UCS2 or UCS4, you can see that sys.maxunicde,65535 is UCS2, and another big number is UCS4.

Let's look at the differences between string and Unicode in Python
Let's take a look at the Simplified Chinese Windows 2003 System, the system code is GBK
>>> a = ' Hello '
>>> A
'/xc4/xe3/xba/xc3 '
>>> b = u ' Hello '
>>> b
U '/u4f60/u597d '
>>> Print a
How are you doing
>>> Print B
How are you doing
>>> a.__class__
<type ' str ' >
>>> b.__class__
<type ' Unicode ' >
>>> Len (a)
4
>>> Len (b)
2

In a Linux environment where the system is encoded as UTF-8
>>> a = ' Hello '
>>> A
'/XE4/XBD/XA0/XE5/XA5/XBD '
>>> b = u ' Hello '
>>> b
U '/u4f60/u597d '
>>> Print a
How are you doing
>>> Print B
How are you doing
>>> a.__class__
<type ' str ' >
>>> b.__class__
<type ' Unicode ' >
>>> Len (a)
6
>>> Len (b)
2

How is it. Briefly summarize:
1. String is expressed directly in quotation marks, and Unicode adds a U before quotation marks.
2, direct input string constants will be encoded by the system default encoding, for example, in the GBK environment, ' hello ' will be encoded as '/xc4/xe3/xba/xc3 ', and in the UTF-8 environment becomes '/xe4/xbd/xa0/xe5/xa5/xbd '.
3, Len (string) returns the number of bytes in string, Len (Unicode) returns the number of characters
4, very important point, print Unicode is not garbled. Now our common Linux, Windows systems, are supported by Unicode, the version is too old. For example, Windows 2003 supports UCS2, so in the Chinese Windows2003, in addition to the normal display of the default GBK encoding, the normal display of UCS2 encoding. For example, in the GBK environment of Chinese Windows 2003:
>>>a = '/xe4/xbd/xa0/xe5/xa5/xbd ' # UTF-8 ' Hello '
>>> Print a
Raccoon 犲 ソ
>>> B = Unicode (A, "UTF-8")
>>> b
U '/u4f60/u597d '
>>> Print B
How are you doing

It should be understood.

Let's say the conversion between string and Unicode, what Unicode (), decode (), encode (), codecs, and so on.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.