A brief analysis of encode and decode of Python

Source: Internet
Author: User
Tags coding standards

A brief analysis of encode and decode of Python



In the python source code file, if you are useful to non-ASCII characters, you need to make a declaration of character encoding at the head of the file, declared as follows:

    1. # code : UTF - 8


Because Python only checks #, coding, and encoded strings, for aesthetic reasons, you can do the following:

    1. #-*-coding:utf-8-*-


Common Coding Introduction:

  • GB2312 Coding: It is suitable for exchanging information between Chinese character processing and Chinese character communication system.

  • GBK code: Is one of the Chinese character coding standards, is the GB2312-80 standard based on the code extension specification, the use of double-byte encoding.

  • ASCII encoding: A uniform provision for the relationship between English characters and binary.

  • Unicode encoding: This is the encoding of all the characters in the world. Of course, it doesn't have a defined storage method.

  • UTF-8 Encoding: is the abbreviation for Unicode transformation Format-8 bit, UTF-8 is a way to implement Unicode . It is a variable-length encoding that can use 1~4 bytes to represent a character, varying the length of a byte depending on the symbol.


This article mainly discusses an error problem: Error "unicodeencodeerror: ' ASCII ' codec can ' t encode characters in position 0-1: ordinal Notin Range (+)"?


(1) What is Unicode encoding


string in python the internal representation is Unicode encoding, so in encoding conversion, it is usually necessary to use Unicode as the intermediate encoding, that is, decoding the other encoded string (decode) into Unicode, and then encoding from Unicode (encode) to another.


(2) The role of Decode and encode


The role of decode is to convert other encoded strings into Unicode encodings, such as str1. Decode (' gb2312 ') that translates the gb2312 encoded string str1 into Unicode encoding.


The role of encode is to convert Unicode encoding into other encoded strings, such as str2. encode (' gb2312 '), which means converting a Unicode encoded string str2 to gb2312 encoding.

Therefore, the transcoding must first understand that the string str is what encoding, and then decode into Unicode, and then encode into other encodings!

Note: The default encoding of the string in the code is consistent with the encoding of the code file itself.


(3) Illustrative examples


First case: s= ' Chinese '

if it is in a UTF8 file, the string is UTF8 encoded, and if it is in a gb2312 file, it is encoded as gb2312. in this case, to encode the conversion, you need to first convert it to Unicode encoding using the Decode method, and then use the Encode method to convert it to another encoding. Typically, you create a code file by using the system default encoding when you do not specify a specific encoding method.


Second case: S=u ' Chinese '

The encoding of the string is specified as Unicode, which is the internal encoding of Python, regardless of the encoding of the code file itself. Therefore, for this case to do the encoding conversion, only need to directly use the Encode method to convert it to the specified encoding.


Note: "1" if a string is already Unicode, then decoding will be an error!

     "2"


(4) Extension


#-*-coding:utf-8-*-


S= ' Chinese '

Print type (s) #查看s的字符类型

Print S


S.decode (' Utf-8 ') #解码utf8, the default encoding method is Unicode

S.decode (' GBK ', "ignore") #解码utf8, ignoring the code with an exception, showing only valid encodings

S.decode (' GBK ', "replace") #替换其中异常的编码

Print type (s)

Print S


S.encode (' gb2312 ') #编码为gb2312

Print type (s)

Print S


Note:

In addition, for some encodings that contain special characters, direct decoding may cause errors, which can be set using the parameters. Such as:
S.decode ("Utf-8", "ignore") ignores code that has an exception, showing only valid encodings
S.decode ("Utf-8", "replace") replaces the encoding of the exception, which is a relative possibility to know that the character encoding is faulty at a glance.





This article is from the "Hand of the Paladin Control" blog, please make sure to keep this source http://wutengfei.blog.51cto.com/10942117/1918500

A brief analysis of encode and decode of Python

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.