Basic knowledge of Python Unicode encoding

Source: Internet
Author: User
Tags printable characters

First, what is Unicode

Before Unicode, people used ASCII code, that is, each English character is stored in a 7-bit binary number in the computer, the range is 32-126, so the ASCII character can only represent 95 printable characters, although the number of bits is then extended to 8 bits, But it can still represent up to 233 characters, which is a huge limitation for thousands of non-European languages.

Unicode breaks the ASCII limit by using one or more bytes to represent a character, and can represent more than 90,000 characters.

Second, how to use Unicode

Unicode string support was introduced from version 1.6 to convert the format, encoding, and operation management of multiple double-byte characters. To make the strings of Unicode and ASCII values look as similar as possible, the Python string changes from the original simple data type to the real object, i.e. the ASCII string becomes the StringType type, and the Unicode string becomes the Unicodetype type.

The built-in STR () function and the Chr () function are not upgraded to handle Unicode. They can only handle regular ASCII-encoded strings. If a Unicode string is passed as a parameter to the STR () function, it is first converted to an ASCII string, followed by a str () function, and if the Unicode string contains any characters that are not supported by an ASCII string, the str () function is reported as an exception.

Python By default all literal strings are ASCII encoded, you can declare a Unicode string by prefixing the string with a ' u ' prefix, or you can use the built-in Unicode () and Unichar () The function converts any Python data type to a Unicode string, and if it is an object that defines the ' __unicode__ () ' method, it can also convert the object to the corresponding Unicode string.

Third, the Unicode in the practical application of the matters needing attention

1. You must add a prefix ' U ' when a string appears in the program;

2. Replace the STR () function with the Unicode () function;

3. Try not to use outdated string modules;

4. Do not encode Unicode characters in your program when necessary, only call the Encode () function when you write to a file or database or network, and then call the Decode () function when reading the data;

The vast majority of modules in the 5.Python standard library are Unicode-compatible, with the exception of the Pickle module, which supports only ASCII strings, so use caution;

6. If you use a third-party module in your program, make sure that each module can use Unicode uniformly;

7. Before each application development, the first to consider the language used, and then clear the unified encoding method;

Basic knowledge of Python Unicode encoding

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.