Python encoding and decoding

Source: Internet
Author: User

Encoding and decoding first, it is clear that the information stored in the computer is binary encoding/decoding is essentially a mapping (correspondence), such as ' a ' with ASCII encoding is 65, the computer is stored in 00110101, but the display can not display 00110101, or to display ' A ', but the computer how to know 00110101 is ' a ', this need to decode, when the choice of ASCII decoding, when the computer read 00110101 to the corresponding ASCII table found to be ' a ', it is displayed as ' a ' encoding: Real character and binary string corresponding relationship, Real characters → binary string decoding: binary strings correspond to real characters, binary strings → real characters ascii & UTF-8 well-known ASCII with 1 bytes 8 bit bit represents one character, the first is 0, the character set is obviously not enough UnicodeCoding System is designed to express any language, in order to prevent the storage of redundancy (for example, the corresponding ASCII code part), it uses the variable length encoding, but the variable length encoding to decoding brings difficulties, can not be judged to be a few bytes to represent a character UTF-8is a prefix for Unicode variable length encoding design, which can be judged by a number of bytes to represent a character if the first bit of a byte is 0, then the byte is a single character, or if the first bit is 1, how many bytes are in a row, and how many byte is the current character. For example, "Strict" Unicode is 4E25 (100111000100101), 4E25 in the range of the third row (0000 0800-0000 FFFF), so "strict" UTF-8 encoding requires three bytes, that is, the format is "1110xxxx 10xxxxxx 10xxxxxx ". Then, from the "strict" the last bits start, sequentially from the back to fill in the format of the X, high 0, get "strict" UTF-8 code is "11100100 10111000 10100101". Decoding and encoding in  python in Python, encoding decoding is actually the conversion between different encoding systems, by default, the conversion target is Unicode, that is, encoding unicode→str, decoding Str→unicode, Where Str refers to the byte stream and Str.decode is to decode the bytes flow str in the given decoding mode, and convert it into Utf-8 form, U.encode is to convert the Unicode class to a byte stream by the given encoding method STR Note that the Unicode object that calls the encode methods is generated by the byte stream, called The Decode method is that the Str object (byte stream) generates a Unicode object, and if the Str object is called encode it is decode to a Unicode object and then encode by default by the system's default encoding. Ignoring the middle default decode often leads to error writing code when you just remember str byte stream call Decode,unicode object call  
123 =u‘严‘sprinttype(s), s
The first line defines a Unicode object (not UTF8) second row output U ' \u4e25 ' third line output <type ' Unicode ' > strict
123 =s.encode(‘utf8‘)uprinttype(u),u
If I use S.encode (' UTF8 '), then S will use UTF-8 encoding and save the encoding result as a byte stream output ' \xe4\xb8\xa5 ' third line output <type ' str ' > Juan also note that the default encoding format for the terminal is GBK , Windows CMD can be viewed and changed through CHCP, or it can be modified to the registry by default encoding (codepage under HKEY_CURRENT_USER Console or PowerShell), 936 for Simplified Chinese, 65001 for UTF8 , both can display Chinese, but in order to facilitate Chinese input, I set it by default to 936 when the print function is called to output the content to the terminal, the Unicode object is converted to the encoding output of the terminal, as the result of the first print above is normal, when the print UTF8 byte stream is Terminal by its default GBK decoding display will be a problem, here happens ' \xe4\xb8 ' for GBK under the "trickle"
12 =s.encode(‘utf8‘).decode(‘utf8‘)t
The second row output U ' \u4e25 ' file encoding format to save the text is also encoded format, such as TXT file save selectable ASCII, UTF8, etc., the py file can be first two linesNote Encoding #-*-Coding:utf-8-*-reading files in python
12 fr =open(‘encode.py‘,‘r‘)fstr =fr.read()
Just remember that fstr is a byte stream, other operations see above can note: The above operations are completed under CMD or PowerShell, there will be a problem in Python's own interpreter, S=u ' Hello ', then S, the display is Unicode object, But the code is GBK, not Unicode.

Python encoding and decoding

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.