Transfer from http://www.cnblogs.com/BeginMan/p/3166363.html
One, the difference between ASCII, Unicode, and UTF-8 in character encoding
Click to read: http://www.cnblogs.com/kingstarspe/p/ASCII.html
Another related blog post: http://www.cnblogs.com/huxi/archive/2010/12/05/1897271.html
Second, Unicode and ASCII
Python can handle Unicode and ASCII encoding, and to make the two look as similar as possible, the Python string has changed from the original simple type to the real object. The ASCII string becomes the StringType, and the Unicode string becomes unicodetype. Use the following:
"Hello World" #ASCII string'Hello World' >>> u'Hello World# Unicode stringu'Hello World
1, str (), Chr () can only use 0~255 as a parameter, that is to say, only processing ASCII strings. If there is a Unicode string, it is automatically converted to ASCII and then passed in to these functions.
Reason: Unicode supports the word than characters, and an exception occurs if there are characters in Str (), Chr () that do not exist in ASCII.
2, Unicode (), Unichar () can be considered as Unicode versions of STR () and Chr ().
>>> Unicode ('Hello World') u'Hello World'
Third, coding and decoding
The problem they solve is coding (encode ()), decoding (decode ()), and not garbled.
Codec represents the encoding method.
"""Writes a Unicode string to a disk file, and then reads it out and displays it;Write the time with UTF-8, read also use UTF-8."""CODEC =‘Utf-8‘FILE =‘Demo.txt‘Strin = u‘Beginman'll be a great coder'Byte_strin = Strin. Encode(CODEC) # encoded with uft-8 f = open (file,'w') f.write (Byte_strin) f.close () F = open (file, 'r') str = F.read () f.close () str_out = str.decode(CODEC) # decode with Utf-8 print str_out # output: Beginman'll be a great coder
Attention:
1, the program in the occurrence of strings must be preceded by prefix u
' Blog Park cnblog' # Don't write like this, so easily garbled as: Å argon å›žå¤ 瑿 nblogs = u' Blog Park cnblog'# right
2, do not use the STR () function, try to use Unicode () instead
3. Do not use outdated string modules
4, there is no need to encode or decode Unicode strings in the program, encoding and decoding is generally used to manipulate files, databases, networks and so on.
5. String formatting
>>>‘%s%s‘ %(‘Begin‘,‘Mans‘)‘Begin Mans‘#Remember the last time the blog about strings said: "Ordinary strings and Unicode strings can be converted to Unicode strings" >>> u‘%s%s'% (U‘Begin', u‘Mans‘) u‘Begin Mans' >>> u‘%s%s‘ %(‘Begin‘,‘Mans‘) u‘Begin Mans' >>>‘%s%s "% (U" begin", ' Man ' ) u "begin man" >>> "%s%s"% ( ' Begin ", U" man ' ) u ' Begin man
Turn Python Learning (-python) character encoding