1. String or byte string?
In my opinion, a python string can only be a byte string. You can even store an image or a binary executable file in it.
Import types
F = open ('d: // hello.jpg ', 'R ')
PIC = f. Read ()
Print type (PIC) = types. stringtype
Print PIC
If the image exists, this code will show true and a string of garbled characters. The so-called string is only a string of bytes.
2. ''And u''
Python strings include ''and U'
The former is a byte string, and the latter is unicode encoded. Unicode is an encoding method that uses two bytes to represent each character. You can test it like this:
>>> Str1 = 'hello'
>>> Str1
'/Xc4/xe3/Xba/xc3' To Get A byte string. The default encoding is cp936, so this is the "hello" cp936 encoding.
>>> Str2 = str1.decode ('cp936 ′)
>>> Str2
U'/u4f60/u597d 'is converted to unicode encoding, and str2 is a unicode string.
Str1 and str2 are both strings, but the encoding method is different. We can see that they both occupy 4 bytes, but the actual length is the same.
>>> Len (str1)
4
>>> Len (str2)
2
Str1 is just a common byte string. Python does not know what encoding it uses, so its length is equal to the number of bytes it occupies. Str2 is a Unicode-encoded
The Decode of the string object tells the system that '/xc4/xe3/Xba/xc3' is a cp936 encoded string, and then the system converts it
Unicode encoded string U'/u4f60/u597d ', the system can determine that this is a string containing two characters. In turn, we can use the encode method
Unicode string is encoded as a normal string.
>>> Str2.encode ('gbk ')
'/Xc4/xe3/Xba/xc3 ′
The surprise is that we can easily get a Chinese Unicode string. Let's test:
>>> Mycity = u'foshan'
>>> Mycity
U'/u4f5d/u5c71 ′
See? Python has been converted for us, and mycity is a unicode string.