Python garbled, encoding, repr, encode, decode exploration, reprdecode

Source: Internet
Author: User
Tags printable characters

Python garbled, encoding, repr, encode, decode exploration, reprdecode
# Encoding: UTF-8
# Run the command line

S = 'Baidu'
Print s # The output environment is gbk, encoded as UTF-8, and output garbled characters
Print s. decode ('utf-8') # => the output environment is gbk and is automatically converted.
Print s. decode ('utf-8'). encode ('utf-8') # The output environment is gbk, encoded as UTF-8, and output garbled characters
Print s. decode ('utf-8'). encode ('gbk') # The output environment is gbk, the encoding is gbk, and the output is normal.


# S = 0xF21938274ABDS... binary memory
# Converting the memory data into a printable string is the print result of repr (s), and the unprintable characters are changed to \ x,
# Repr (s) is neither the memory data nor the memory data is interpreted as printable characters by char.
# The meaning of repr (s) is the result of print repr (s). It is a realistic character that memory data is interpreted as char.
# If s = 0x24 = '\ n' is set, print repr (s) =>' \ n', repr (s) is '\ n' (because \ is to be displayed \)
#
# The memory data of s can be interpreted according to a certain encoding to get the correct meaning
# If s is encoded as 'haha' in UTF-8, the meaning of 'hahaha' can be obtained after UTF-8 is interpreted.
# Explain s in python according to xyz encoding, that is, s. decode (xyz)
# S. decode (xyz) gets a variable in python and does not care about its memory representation. It may be "Data + encoding 』
# No matter how it is expressed in the memory, the abstract meaning of this variable can be extracted, and we think it is an abstract 'haha'
# The encoding of an interpreted string can be converted, that is, the meaning remains unchanged. The encoding method is changed to get different binary data.
# The output environment of cmd is gbk. UTF-8 encoded binary data will be garbled directly in print in cmd.
# Decode the UTF-8 encoded data first in UTF-8 format and then encode it as gbk binary data. The output in cmd is normal.
# Decode the UTF-8 encoded data first in UTF-8 format and output the data directly. python will automatically detect the output environment.
# Automatically decode meaningful strings according to the output Environment


# In python # encoding: xxx indicates the format in which the python code is decoded.
# Generally, different codes can be recognized in English.
# When editing a file, it interacts with the editor. What we see during editing is the meaning. The editor uses an encoding to save the meaning of the Code.
# Code meaning saved as a binary data file by the editor in the memory
# When the code file is executed by the python interpreter, it will be searched for # encoding: xxx to determine the code format encoding
# If the declared encoding method in the Code is different from the encoding method for saving the file, except for English characters, other codes may encounter errors.
# Think That the python interpreter uses B encoding to explain the meaning of saving the editor with A Encoding
# Generally, different Chinese encoding methods are not compatible, so different encoding interpretations may cause garbled characters.


# General explanation
# What we use to communicate is the meaning of the language. Different encodings are equivalent to writing different texts and are used to save meaning.
# If you get a piece of text, the explanation in English syntax is to decode it in English format. Decode in python
# The ideal sentence is to save it in Chinese encoding. Encode in python
# Interaction with input and output indicates that the interaction is successful if the output is not garbled. Otherwise, the interaction fails.
# The encoding method is the language used by different software interpreters, file editors, and cmd command lines.
# That is, we need to make these software communicate successfully when they may use different languages.
# In this way, we will pass the meaning to the editor. The editor will be written in A language, and the interpreter will be read in A language.
# The cmd output must use language B. The interpreter must convert the text of language A into meaning, and then convert the text of Language B to cmd.
# Cmd gets the text of Language B, which can translate its meaning and display it on the screen, that is, the output result without garbled characters.



# Therefore, if the above Code is not executed by cmd in other python ides, different results may be obtained.
# Because cmd uses gbk "language", and other python IDE may directly use UTF-8, The garbled characters may be different.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.