Python tips: Simplified Unicode

Source: Internet
Author: User
For someone from. NET, I am not very comfortable with str and unicode in python. Encoding exceptions are often encountered in some common scenarios. For example, if you want to save a string to text, do you need to encode the string or directly save it? Is the string str or unicode? To save a string to the database, do you directly save str or encode the string with unicode first?
There are many question marks, which are all problems for beginners of python. After several attempts, I finally had some ideas. Unicode has already been well implemented in python, but I am not using it properly.

It is critical to remember that all strings in the Code use unicode, rather than str.In this way, you can clearly understand the string type to be processed. Remember, it's all, anywhere.
For example:
>>> S1 = U' % s welcome! '% U' Beijing'
>>> S1
U' \ u5317 \ u4eac \ u6b22 \ u8fce \ u60a8 \ uff01'
>>> Print s1
Welcome to Beijing!

If it is like this, an exception will be thrown:
>>> S2 = '% s welcome! '% U' Beijing'
Traceback (most recent call last ):
File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'ascii 'codec can't decode byte 0xe6 in position 2: ordinal not in range (128)
It can be guessed by UnicodeDecodeError. The parser tries to use ascii to '% s. Welcome! ', Because' % s welcomes you! 'The actual use of UTF-8 encoding (which is the default of my system terminal), so ascii decoding will certainly be wrong. This exception can be reproduced as long as it is as follows:
>>> S2 = '% s welcome! '. Decode ('ascii ')
Traceback (most recent call last ):
File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'ascii 'codec can't decode byte 0xe6 in position 2: ordinal not in range (128)

Encode and decode are distinguished. Str --> decode (c) --> unicode, unicode --> encode (c) --> str, where the encoding Type c must be the same.

Before writing a unicode string to a file, encode it with a specific encoding (such as unicodestr. encode ('utf-8') to get str, ensure that the written file is str; read from the file to str, and then decode it (such as encodestr. decode ('utf-8') to get unicode.This is an inverse operation. The encoding type must be consistent. Otherwise, an exception may occur.

I support unicode myself, but do other people in your team use unicode? Do other modules you use unicode? This must be clear, otherwise there will also be many exceptions due to Encoding Problems.

Well, it's late. Let's just write something. Good night, 2008.8.1. The first eclipse of this century. Do you want to watch it?

Technorati labels: python, unicode, str

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.