Python coding introduces--encode and decode

Source: Internet
Author: User
Tags coding standards

In the python source code file, if you are useful to non-ASCII characters, you need to make a declaration of character encoding at the head of the file, declared as follows:

    1. # code : UTF - 8

Because Python only checks for #, coding, and encoded strings, you may be able to see the following declarations, which some people write for reasons such as aesthetics:
    1. #-*- coding:utf-8 -*-


Common Coding Introduction:

  • GB2312 encoding: For the exchange of information between Chinese character processing and Chinese character communication systems
  • GBK encoding: Is one of the Chinese character coding standards, is an internal code extension specification based on the GB2312-80 standard, using a double-byte encoding
  • ASCII encoding: A uniform provision for the relationship between English characters and binary
  • Unicode encoding: This is the encoding of all the characters in the world. Of course, it doesn't have a defined storage method.
  • UTF-8 encoding: is the abbreviation for Unicode transformation Format-8 bit, and UTF-8 is a way to implement Unicode . It is a variable-length encoding that can use 1~4 bytes to represent a character, varying the length of a byte depending on the symbol.

Encoding conversion:

Strings inside Python are generally unicode-encoded. The default encoding of the string in the code is consistent with the encoding of the code file itself. so to do some encoding conversion is usually done in Unicode as an intermediate encoding, that is, the other encoded string decoding (decode) into Unicode, and then from the Unicode encoding (encode) into another encoding.

  • The role of decode is to convert other encoded strings to Unicode encoding, eg Name.decode ("GB2312"), to convert the GB2312 encoded string name to Unicode encoding
  • The role of encode is to convert the Unicode encoding into other encoded strings, eg Name.encode ("GB2312"), to convert the GB2312 encoded string name to GB2312 encoding

Therefore, in the encoding conversion must first know that name is the type of encoding, and then decode into Unicode encoding, and finally download encode to encode the encoding. Of course, if name is already Unicode encoded, then you do not need to do decode decoding conversion, directly with the encode can be encoded into the code you need. It is important to note that encoding Unicode and encoding str are all wrong.

Specifically, if in the UTF-8 file, the string is UTF-8 encoded. Its encoding depends on the current text encoding. Of course, the encoding of the GB2312 text is GB2312. in order to do the output of two encodings in the same text, the conversion must be encoded , first using decode to convert the original encoding of the text to Unicode, and then using encode to convert the encoding into the encoding to be converted.

eg
Since the built-in function open () Opens the file, read () reads Str, and after reading it needs to be decode () using the correct encoding format. Write (), if the parameter is Unicode, it needs to be encode () with the encoding you wish to write, and if it is a different encoded format of STR, it needs to be decode () with that Str's encoding before it is converted to Unicode Encode (). If you pass Unicode as a parameter directly into write (), Python will first encode and write using the character encoding declared by the source code file.

    1. # Coding:utf-8
    2. FP1 = open (' Test.txt ', ' R ')
    3. Info1 = Fp1.read ()
    4. # known to be GBK encoded, decoded into Unicode
    5. TMP = Info1.decode (' GBK ')
    6. FP2 = open (' Test.txt ', ' W ')
    7. # encoded into UTF-8-encoded STR
    8. Info2 = Tmp.encode (' UTF-8 ')
    9. Fp2.write (Info2)
    10. Fp2.close ()

How to get the encoding:
The judgment is that the S string no is Unicode, and if it is true, it returns false:
    1. Isinstance (S, Unicode)


The following code can obtain the system default encoding:

    1. #!/usr/bin/env python
    2. #coding =utf-8
    3. Import sys
    4. Print sys.getdefaultencoding ()

Python coding introduces--encode and decode

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.