Python garbled, encoding, repr, encode, decode exploration, reprdecode
# Encoding: UTF-8
# Run the command line
S = 'Baidu'
Print s # The output environment is gbk, encoded as UTF-8, and output garbled characters
Print s. decode ('utf-8') # => the output environment is gbk and is automatically converted.
Print s. decode ('utf-8'). encode ('utf-8') # The output environment is gbk, encoded as UTF-8, and output garbled characters
Print s. decode ('utf-8'). encode ('gbk') # The output environment is gbk, the encoding is gbk, and the output is normal.
# S = 0xF21938274ABDS... binary memory
# Converting the memory data into a printable string is the print result of repr (s), and the unprintable characters are changed to \ x,
# Repr (s) is neither the memory data nor the memory data is interpreted as printable characters by char.
# The meaning of repr (s) is the result of print repr (s). It is a realistic character that memory data is interpreted as char.
# If s = 0x24 = '\ n' is set, print repr (s) =>' \ n', repr (s) is '\ n' (because \ is to be displayed \)
#
# The memory data of s can be interpreted according to a certain encoding to get the correct meaning
# If s is encoded as 'haha' in UTF-8, the meaning of 'hahaha' can be obtained after UTF-8 is interpreted.
# Explain s in python according to xyz encoding, that is, s. decode (xyz)
# S. decode (xyz) gets a variable in python and does not care about its memory representation. It may be "Data + encoding 』
# No matter how it is expressed in the memory, the abstract meaning of this variable can be extracted, and we think it is an abstract 'haha'
# The encoding of an interpreted string can be converted, that is, the meaning remains unchanged. The encoding method is changed to get different binary data.
# The output environment of cmd is gbk. UTF-8 encoded binary data will be garbled directly in print in cmd.
# Decode the UTF-8 encoded data first in UTF-8 format and then encode it as gbk binary data. The output in cmd is normal.
# Decode the UTF-8 encoded data first in UTF-8 format and output the data directly. python will automatically detect the output environment.
# Automatically decode meaningful strings according to the output Environment
# In python # encoding: xxx indicates the format in which the python code is decoded.
# Generally, different codes can be recognized in English.
# When editing a file, it interacts with the editor. What we see during editing is the meaning. The editor uses an encoding to save the meaning of the Code.
# Code meaning saved as a binary data file by the editor in the memory
# When the code file is executed by the python interpreter, it will be searched for # encoding: xxx to determine the code format encoding
# If the declared encoding method in the Code is different from the encoding method for saving the file, except for English characters, other codes may encounter errors.
# Think That the python interpreter uses B encoding to explain the meaning of saving the editor with A Encoding
# Generally, different Chinese encoding methods are not compatible, so different encoding interpretations may cause garbled characters.
# General explanation
# What we use to communicate is the meaning of the language. Different encodings are equivalent to writing different texts and are used to save meaning.
# If you get a piece of text, the explanation in English syntax is to decode it in English format. Decode in python
# The ideal sentence is to save it in Chinese encoding. Encode in python
# Interaction with input and output indicates that the interaction is successful if the output is not garbled. Otherwise, the interaction fails.
# The encoding method is the language used by different software interpreters, file editors, and cmd command lines.
# That is, we need to make these software communicate successfully when they may use different languages.
# In this way, we will pass the meaning to the editor. The editor will be written in A language, and the interpreter will be read in A language.
# The cmd output must use language B. The interpreter must convert the text of language A into meaning, and then convert the text of Language B to cmd.
# Cmd gets the text of Language B, which can translate its meaning and display it on the screen, that is, the output result without garbled characters.
# Therefore, if the above Code is not executed by cmd in other python ides, different results may be obtained.
# Because cmd uses gbk "language", and other python IDE may directly use UTF-8, The garbled characters may be different.