Content encoding in Python

Last Update:2017-08-30 Source: Internet

Author: User

Tags ming

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

I. Python code INTRODUCTION 1) Introduction to encoding format

When the Python interpreter loads the code in the. py file, the content is encoded (default ASCII), ASCII (American Standard Code for Information interchange, The American Standard Information Interchange Code) is a set of computer coding systems based on the Latin alphabet, mainly used to display modern English and other Western European languages, which can be represented at most 8 bits (one byte), i.e. 2**8 = 256, so that ASCII codes can represent up to 256 symbols. It is clear that the ASCII code cannot represent all the words and symbols in the world, so it is necessary to create a new encoding that can represent all the characters and symbols, namely: Unicode

Unicode (Uniform Code, universal Code, single code) is a character encoding used on a computer. Unicode is created to address the limitations of the traditional character encoding scheme, which sets a uniform and unique binary encoding for each character in each language, which specifies that characters and symbols are represented by at least 16 bits (2 bytes), that is: 2 **16 = 65536, note: This is said to be at least 2 bytes, possibly more.

UTF-8, which is compression and optimization of Unicode encoding, does not use a minimum of 2 bytes, but classifies all characters and symbols: the contents of the ASCII code are saved in 1 bytes, the characters in Europe are saved in 2 bytes, and the characters in East Asia are saved in 3 bytes. Therefore, when the Python interpreter loads the code in the. py file, the content is encoded (python3.x default UTF-8).

 a= "    " #   View the method in a   Print   (dir (a))  #   show the data type of a  print   (type (a))  #   shows the code value of a   Print   (Ord (a))  #   26126  #   converts code values to corresponding characters  Span style= "COLOR: #0000ff" >print  (chr (26126 #   Ming

A=" tomorrow "#  len-The length of the character print(A.__len__())  #  bytes The length of print(A.encode ("utf-8").  __len__())

in computer memory, Unicode encoding is used uniformly, and is converted to UTF-8 encoding when it needs to be saved to the hard disk or when it needs to be transferred. when editing with Notepad, the UTF-8 characters read from the file are converted to Unicode characters into memory, and when the edits are complete, the conversion of Unicode to UTF-8 is saved to the file:

When browsing the Web page, the server will convert dynamically generated Unicode content to UTF-8 and then transfer it to the browser, so you can see that a lot of web pages will have similar information on the source <meta charset="UTF-8" /> , indicating that the page is using UTF-8 encoding.

2) byte

Because Python's string type is str, it is represented in memory in Unicode and a character corresponds to several bytes. If you want to transfer on the network, or save to disk, you need to turn str into bytes for each character in Bytes,bytes to occupy one byte. Conversely, if we read the byte stream from the network or disk, then the data read is bytes. To turn bytes into STR, you need to use the Decode () method:

#CodingA="ABC"Print(A.encode ("Utf-8")) b="Xiao Ming"#will be error, Chinese beyond the ASCII range#Print (B.encode ("ASCII"))Print(B.encode ("Utf-8"))#b ' \xe5\xb0\x8f\xe6\x98\x8e '#decodingPrint(b'\xe5\xb0\x8f\xe6\x98\x8e'. Decode ("Utf-8"))Print(b"ABC". Decode ("ASCII"))

1 Chinese characters are UTF-8 encoded and typically consume 3 bytes, while 1 English characters take up only 1 bytes. When manipulating strings, we often encounter mutual conversions between Str and bytes. To avoid garbled problems, you should always use UTF-8 encoding to convert str and bytes.

Content encoding in Python

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More