For someone from. NET, I am not very comfortable with str and unicode in python. Encoding exceptions are often encountered in some common scenarios. For example, if you want to save a string to text, do you need to encode the string or directly save it? Is the string str or unicode? To save a string to the database, do you directly save str or encode the string with unicode first?
There are many question marks, which are all problems for beginners of python. After several attempts, I finally had some ideas. Unicode has already been well implemented in python, but I am not using it properly.
It is critical to remember that all strings in the Code use unicode, rather than str.In this way, you can clearly understand the string type to be processed. Remember, it's all, anywhere.
For example:
>>> S1 = U' % s welcome! '% U' Beijing'
>>> S1
U' \ u5317 \ u4eac \ u6b22 \ u8fce \ u60a8 \ uff01'
>>> Print s1
Welcome to Beijing!
If it is like this, an exception will be thrown:
>>> S2 = '% s welcome! '% U' Beijing'
Traceback (most recent call last ):
File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'ascii 'codec can't decode byte 0xe6 in position 2: ordinal not in range (128)
It can be guessed by UnicodeDecodeError. The parser tries to use ascii to '% s. Welcome! ', Because' % s welcomes you! 'The actual use of UTF-8 encoding (which is the default of my system terminal), so ascii decoding will certainly be wrong. This exception can be reproduced as long as it is as follows:
>>> S2 = '% s welcome! '. Decode ('ascii ')
Traceback (most recent call last ):
File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'ascii 'codec can't decode byte 0xe6 in position 2: ordinal not in range (128)
Encode and decode are distinguished. Str --> decode (c) --> unicode, unicode --> encode (c) --> str, where the encoding Type c must be the same.
Before writing a unicode string to a file, encode it with a specific encoding (such as unicodestr. encode ('utf-8') to get str, ensure that the written file is str; read from the file to str, and then decode it (such as encodestr. decode ('utf-8') to get unicode.This is an inverse operation. The encoding type must be consistent. Otherwise, an exception may occur.
I support unicode myself, but do other people in your team use unicode? Do other modules you use unicode? This must be clear, otherwise there will also be many exceptions due to Encoding Problems.
Well, it's late. Let's just write something. Good night, 2008.8.1. The first eclipse of this century. Do you want to watch it?
Technorati labels: python, unicode, str