Recently, the coding and decoding of Python2 and Python3 is confusing, and it is clear that the code and decoding of Python will be clearer after consulting with the cloud God.
In this personal understanding combined with the great God's guidance to share, the following content only represents a personal point of view, do not rule out errors!
First of all need to understand encode = = Encoding, decode = = decoding, encode is the logical character into binary data, in order to store and transfer.
(as to how the pre-and post-decoded characters are stored, it is Python's internal implementation that only python needs to worry about;
It's like you don't have to worry about what the integer looks like in Python memory, but when you save the integer or transfer it to the network;
You have to consider whether to convert to a decimal string representation, or to a 32-bit unsigned small-endian representation, or a 64-bit signed network order representation ...)
# # Python 2.x is very special and confusing, the default string (str) object is ASCII encoded and needs to be specified as Unicode encoding to support non-English strings
There are two types of strings in the python2.x, str and Unicode:
str1 = ' My name is Ray ' # ASCII string
USTR1 = U ' My name is Ray ' # Unicode string
Decoding and encoding in Python2: non-Unicode encoded STR can be encode to other encodings after decoding (decode) is Unicode.
Str1.decode (' ASCII ') # ASCII ==> Unicode
Str1.decode (' ASCII '). Encode (' Utf-8 ') # Unicode ==> utf-8
Understanding of Ascii\unicode\utf-8 Coding:
The 1.ASCII and Unicode encodings are two different encoding standards, while UTF-8 is a variable-length character encoding for Unicode, also known as the Universal Code.
The 2.ascii encoding is 1 bytes (2^8), while the Unicode encoding is typically 2 bytes (2^16) and one byte is 8bit, but utf-8 encoding is generally 3 bytes.
3.ASCII encoding only supports English letters and numbers and some symbols, Unicode encoding can support characters from all over the world, Chinese and Latin and so on.
4. The utf-8 variable long character encoding widely used in programming is 1 bytes for plain English characters in Utf-8 encoding or ASCII encoding, and non-English characters characters 3 bytes.
______________________________________________________________
| Characters | ASCII | Unicode | UTF-8 |
---------------------------------------------------------------------------------------------------
| A | 01000001 | 00000000 01000001 | 01000001 |
---------------------------------------------------------------------------------------------------
| Medium | X | 01001110 00101101 | 11100100 10111000 10101101 |
---------------------------------------------------------------------------------------------------
5. Memory read-write information is in bytes.
__________________________________________
| string |
--------------------------------------------------------------------
| char | char | char |
--------------------------------------------------------------------
| byte | byte | byte | byte | byte | byte |
--------------------------------------------------------------------
# # Python 3.x the default string (str) object is Unicode encoded and can be viewed using sys.getdefaultencoding ().
Python3 in the understanding of STR and byte:
As mentioned above encode = = Encoding, decode = = decoding, encode is the logical character into binary data, in order to store and transfer.
Str in Python3 is that the string data object cannot be Decode,byte binary data object in the same vein and cannot be encode;
STR calls the Encode method to produce a coded byte type (binary) string, while the byte-type string support decoding (decode) operation is converted back to the STR type.
So in the IO module there are Stringio and Bytesio, memory read and write binary data using Bytesio, memory read and write strings using Stringio.
# str Object
s = "Example"
# Bytes Object
B = B "Example"
# str to bytes
Bytes (s, encoding = "UTF8")
# bytes to Str
STR (b, encoding = "Utf-8")
# an alternative method
# str to bytes
Str.encode (s)
# bytes to Str
Bytes.decode (b)
Python encode and Decode