Encoding and decoding in Python

Last Update:2015-05-14 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Encoding and decoding first, it is clear that the information stored in the computer is binary encoding/decoding is essentially a mapping (Correspondence Relationship), such as ' A ' ASCII encoding is 65, the computer is stored in 00110101, but the display can not be displayed00110101, or to show ' a ', but how does the computer know00110101 is ' a ', which needs to be decoded when choosing to decode with ASCII when the computer reads00110101 when the corresponding ASCII table is found to be ' a ', it is displayed as ' a ' Encoding: True character and binary string correspondence, real character → binary string decoding: binary string corresponding to real character, binary string → Real character ascii & UTF-8 well-known ASCII with 1 bytes 8 bit bit represents a character, the first is all 0, The character set represented is obviously not enough UnicodeCoding System is designed to express any language, in order to prevent the storage of redundancy (for example, the corresponding ASCII code part), it uses the variable length encoding, but the variable length encoding to decoding brings difficulties, can not be judged to be a few bytes to represent a character UTF-8is a prefix for Unicode variable length encoding design, which can be judged by a number of bytes to represent a character if the first bit of a byte is 0, then the byte is a single character, or if the first bit is 1, how many bytes are in a row, and how many byte is the current character. For example, "Strict" Unicode is 4E25 (100111000100101), 4E25 in the range of the third row (0000 0800-0000 FFFF), so "strict" UTF-8 encoding requires three bytes, that is, the format is "1110xxxx 10xxxxxx 10xxxxxx ". Then, from the "strict" the last bits start, sequentially from the back to fill in the format of the X, high 0, get "strict" UTF-8 code is "11100100 10111000 10100101". decoding and encoding in Pythonin Python, encoding decoding is actually a conversion between different encoding systems, by default, the conversion target is Unicode, which is encoded UNICODE→STR, decoding str→unicode, where Str refers to the byte streamWhile Str.decode is decoding the byte stream str in the given decoding mode and converting it into utf-8 form, U.encode is converting the Unicode class to a byte stream by the given encoding method STR notices that the Unicode object is generated by the word stream, and the Decode method is called S TR Object (Byte streamA Unicode object is generated, and if the Str object calls encode defaults to decode to Unicode objects by default, ignoring the middle default decode often results in an error.Write your own code just remember STR byte stream call Decode,unicode object call

123	`s` `=u‘严‘sprinttype(s), s`

The first line defines a Unicode object (not UTF8) second row output U ' \u4e25 ' third line output <type ' Unicode ' > strict

123	`u` `=s.encode(‘utf8‘)uprinttype(u),u`

If I use S.encode (' UTF8 '), then S uses UTF-8 encoding and saves the encoded result as a byte stream output ' \xe4\xb8\xa5 ' third line output<type ' str ' > JuanIt is also important to note that the default encoding format for the terminal is GBK,windows CMD can be viewed and changed through CHCP, or it can be modified to the registry by default encoding of the terminal (HKEY_CURRENT_USER Console or PowerShell under codepage), 936 is Simplified Chinese, 65001 is UTF8, both can display Chinese, but for the convenience of Chinese input, I set it by default to 936When the print function is called to format the output to the terminal, the Unicode object is converted to the encoding output of the terminal, as the result of the first print above is normal, when the print UTF8 byte stream, the terminal by its default GBK decoding display will be a problem, here happens ' \xe4\ Xb8 ' "Juan" under the GBK

12	`t` `=s.encode(‘utf8‘).decode(‘utf8‘)t`

Second row outputu ' \u4e25 ' The encoding format of the file is also encoded when saving text, such as TXT file save selectable ASCII, UTF8, etc., read files in Python

12	`fr` `=open(‘encode.py‘,‘r‘)fstr` `=fr.read()`

just remember Fstr .is the byte stream, the other operation see above can Note: The above operations are done under CMD or PowerShell, there is a problem with Python's own interpreter, S=u ' Hello ', and then S, although the Unicode object is displayed, but the encoding is GBK instead of the Unicode reference

Introduction to character encoding http://blog.csdn.net/trochiluses/article/details/8782019
chcp http://baike.baidu.com/link?url=_ Qajtlxmrjod5ppv8ykh7om7uhqtucqud5wqawfrtmcmg3ii3f3s7r11xd6rqf6zkzh_ljz-1dwzexyxei2_lq
Python character encoding and decoding http://blog.csdn.net/trochiluses/article/details/16825269

Encoding and decoding in Python

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Encoding and decoding in Python

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Encoding and decoding in Python

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support