encoding conversion, storage and garbage summarization of Python strings

Source: Internet
Author: User
Tags character set

The Python2 string contains str, Unicode two types, and Str's string encoding is determined by the encoding of the source file, which is currently used primarily in the UTF-8 encoding format, so you want to specify the encoding format in the header of the py file:

#-*-Coding:utf-8-*-

Within a Python program, a string literal is a Unicode encoding, and the string character is a memory-encoded format that converts a Unicode-encoded string to a storage-encoding format for a particular character set if the data is stored in a file or log. For example: UTF-8, GBK, and so on, many times Python programmers will encounter garbled problems, I believe that the following some of the processing methods and skills can help you solve garbled problems.

Unicode to Utf-8 conversion method: Unicodestr.encode (' utf-8 '), example:

>>> u ' Chinese test ' encode (' utf-8 ')
' \xe4\xb8\xad\xe6\x96\x87\xe6\xb5\x8b\xe8\xaf\x95 '

Utf-8 to Unicode conversion method: Utf8str.decode (' utf-8 '), example:

>>> ' Chinese test '. Decode (' Utf-8 ')
U ' \u4e2d\u6587\u6d4b\u8bd5 '

In fact, in the process of storing Unicode in text, there is also a way of storing it, instead of converting Unicode to the actual text storage character set, but storing Unicode memory encoding values, and then inverting them back when reading files. is to adopt: Unicode-escape conversion Mode.

Unicode to Unicode-escape conversion method: Unicodestr.encode (' Unicode-escape '), example:

>>> u ' Chinese test ' encode (' Unicode-escape ')
' \\u4e2d\\u6587\\u6d4b\\u8bd5 '

Unicode-escape to Unicode conversion method: Unicodeescapestr.decode (' Unicode-escape '), example:

>>> ' \\u4e2d\\u6587\\u6d4b\\u8bd5 ' decode (' Unicode-escape ')
U ' \u4e2d\u6587\u6d4b\u8bd5 '

For UTF-8 encoded strings, when stored, they are usually stored directly, and there is actually a way of storing utf-8 encoded values, namely: String-escape.
Utf-8 to String-escape conversion method: Utf8str.encode (' String-escape '), example:

>>> ' Chinese test '. Encode (' String-escape ')
' \\xe4\\xb8\\xad\\xe6\\x96\\x87\\xe6\\xb5\\x8b\\xe8\\xaf\\x95 '
>>> print ' \\xe4\\xb8\\xad\\xe6\\x96\\x87\\xe6\\xb5\\x8b\\xe8\\xaf\\x95 '
\xe4\xb8\xad\xe6\x96\x87\xe6\xb5\x8b\xe8\xaf\x95

String-escape to Utf-8 conversion method: Strescapestr.decode (' String-escape '), example:

>>> ' \\xe4\\xb8\\xad\\xe6\\x96\\x87\\xe6\\xb5\\x8b\\xe8\\xaf\\x95 ' decode (' String-escape ')
' \xe4\xb8\xad\xe6\x96\x87\xe6\xb5\x8b\xe8\xaf\x95 '
>>> print ' \xe4\xb8\xad\xe6\x96\x87\xe6\xb5\x8b\xe8\xaf\x95 '
Chinese test

Note:
Unicode also supports encoding conversions including:

Idna
Raw_unicode_escape
Utf_8_sig

Utf-8, GBK and other encoded strings also support coding conversions including:

Base64
Quopri
bz2
Hex
Unicode_internal
Uu
Zlib

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.