Content encoding in Python

Source: Internet
Author: User
Tags ming

I. Python code INTRODUCTION 1) Introduction to encoding format

When the Python interpreter loads the code in the. py file, the content is encoded (default ASCII), ASCII (American Standard Code for Information interchange, The American Standard Information Interchange Code) is a set of computer coding systems based on the Latin alphabet, mainly used to display modern English and other Western European languages, which can be represented at most 8 bits (one byte), i.e. 2**8 = 256, so that ASCII codes can represent up to 256 symbols. It is clear that the ASCII code cannot represent all the words and symbols in the world, so it is necessary to create a new encoding that can represent all the characters and symbols, namely: Unicode

Unicode (Uniform Code, universal Code, single code) is a character encoding used on a computer. Unicode is created to address the limitations of the traditional character encoding scheme, which sets a uniform and unique binary encoding for each character in each language, which specifies that characters and symbols are represented by at least 16 bits (2 bytes), that is: 2 **16 = 65536, note: This is said to be at least 2 bytes, possibly more.

UTF-8, which is compression and optimization of Unicode encoding, does not use a minimum of 2 bytes, but classifies all characters and symbols: the contents of the ASCII code are saved in 1 bytes, the characters in Europe are saved in 2 bytes, and the characters in East Asia are saved in 3 bytes. Therefore, when the Python interpreter loads the code in the. py file, the content is encoded (python3.x default UTF-8).

 a= "    " #   View the method in a   Print   (dir (a))  #   show the data type of a  print   (type (a))  #   shows the code value of a   Print   (Ord (a))  #   26126  #   converts code values to corresponding characters  Span style= "COLOR: #0000ff" >print  (chr (26126 #   Ming  
A=" tomorrow "#  len-The length of the character print(A.__len__())  #  bytes The length of print(A.encode ("utf-8").  __len__())

in computer memory, Unicode encoding is used uniformly, and is converted to UTF-8 encoding when it needs to be saved to the hard disk or when it needs to be transferred. when editing with Notepad, the UTF-8 characters read from the file are converted to Unicode characters into memory, and when the edits are complete, the conversion of Unicode to UTF-8 is saved to the file:

When browsing the Web page, the server will convert dynamically generated Unicode content to UTF-8 and then transfer it to the browser, so you can see that a lot of web pages will have similar information on the source <meta charset="UTF-8" /> , indicating that the page is using UTF-8 encoding.

2) byte

Because Python's string type is str, it is represented in memory in Unicode and a character corresponds to several bytes. If you want to transfer on the network, or save to disk, you need to turn str into bytes for each character in Bytes,bytes to occupy one byte. Conversely, if we read the byte stream from the network or disk, then the data read is bytes. To turn bytes into STR, you need to use the Decode () method:

#CodingA="ABC"Print(A.encode ("Utf-8")) b="Xiao Ming"#will be error, Chinese beyond the ASCII range#Print (B.encode ("ASCII"))Print(B.encode ("Utf-8"))#b ' \xe5\xb0\x8f\xe6\x98\x8e '#decodingPrint(b'\xe5\xb0\x8f\xe6\x98\x8e'. Decode ("Utf-8"))Print(b"ABC". Decode ("ASCII"))

1 Chinese characters are UTF-8 encoded and typically consume 3 bytes, while 1 English characters take up only 1 bytes. When manipulating strings, we often encounter mutual conversions between Str and bytes. To avoid garbled problems, you should always use UTF-8 encoding to convert str and bytes.

Content encoding in Python

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.