Python Unicode encoding

Source: Internet
Author: User

Tips for use

In fact, as long as the following rules are followed, you can circumvent the 90% bug caused by Unicode string processing, the remaining 10% can be solved by Python libraries and modules.

    • You must prefix a string when it appears in the program.

    • Do not use the STR () function in place of Unicode ().

    • Don't use outdated string modules--if it's a non-ASCII character, it will screw everything up.

    • Do not decode Unicode characters in your program if necessary. Call the Encode () function only when you are writing to a file or database or network, and then call the Decode () function only when you need to read the data back.


Lessons from the reality

Error #: You have to write a large application in a very limited amount of time and need support from other languages, but the product manager does not clearly define this. You do not consider Unicode compatibility, knowing that the project is nearing the end ... It's almost impossible to add Unicode support at this time, will it?

Result #: The end user's need for other language interfaces was not predicted, and Unicode support was not used when integrating the applications they used for other languages. Updating the entire system makes people feel dull and wasteful of time.


Error #: Use the string module or the str () and CHR () functions everywhere in the source code.

Result # #: Replace str () and Chr () with Unicode () and UNICHR () with global lookup substitution, but it is possible that the Pickle module can no longer be used, and only the data that is to be pickle processed will be stored in binary form. This makes it necessary to modify the structure of the database, and modifying the database structure means pushing it all over again.


Error # #: It is not possible to determine that all secondary systems fully support Unicode.

The result is: you have to patch those systems, and some of these systems may not have the source code at all. Fixing bugs that are supported for Unicode can reduce the reliability of your code and is likely to introduce new bugs.


Summary: Enabling applications to fully support Unicode, compatible with other languages itself is a project. It requires detailed consideration and planning. All software and systems involved need to be checked, including Python's standard library and other third-party extensions that will be used. You may even need to build an experienced team to specialize in internationalization (i18n) issues.


Excerpt from "Python Core programming (second Edition)" P130, P131


Python Unicode encoding

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.