python summary of encode and decode misreading
Recently learning Python, there's a misunderstanding about coding.
The following is an erroneous understanding:
Encode (): encoding that converts the encoding of an object into a specified encoding format, which, in the literal sense, has always been thought of as converting other encoding formats into Unicode format encodings
Decode (): Decoding, is the inverse of the encoding process. Parsing and decoding, converting Unicode format to other formats.
Check out some information and other great God blogs to get the right recognition and understanding
The role of Decode is to convert other encoded strings into Unicode encodings, such as Str1.decode (' gb2312 '), to convert gb2312 encoded string str1 into Unicode encoding.
The role of encode is to convert Unicode encoding into other encoded strings, such as Str2.encode (' gb2312 '), to convert Unicode encoded string str2 to gb2312 encoding.
Python is a language that is prone to coding problems. So, I write down these words according to my understanding.
First, there are several concepts to understand.
* Bytes: Representation of computer data. 8-bit binary. can represent unsigned integers: 0-255. Below, a string consisting of "bytes" is denoted by "byte stream".
* Characters: The English character "abc", or the Chinese characters "you I he". The character itself does not know how to save it in the computer. In the following paragraphs, the word "string" is avoided and "text" is used to table
A string that consists of "characters".
* Code (VERB): Converts "text" to "byte stream" according to a certain rule (this rule is called: encoding (noun)). (in Python: Unicode becomes str)
* Decode (verb): Converts a "byte stream" into "text" according to a rule. (in Python: Str becomes Unicode)
* * In fact, anything that is represented in a computer requires coding. For example, the video is encoded and then saved in a file, which needs to be decoded for viewing when playing.
Unicode:unicode defines the correspondence between a "character" and a "number", but does not specify how the "number" is saved in the computer. (Just like in C, an integer can be either int or short.) Unicode does not specify whether to use int or short to denote a "character")
Utf8:unicode implementation. It uses the Unicode-defined "character" "number" mapping, which in turn specifies how the number is saved on the computer. The other utf16 are Unicode implementations.
Summarize:
encoding is converting text (string) into byte stream, Unicode format to other encoding format
Decoding is converting bytes into strings (text), other encoding formats to Unicode
Python summary of encode and decode misreading