When I started to learn programming, the old understanding of string encoding was vague. have been looking at this handy information, today when looking at dive in Python, suddenly have a new understanding (do not know whether it is correct).
Python has a built-in function, Ord (), which returns the Unicode value of a character. This function has no other arguments, that is, given a character, there will be a specific value corresponding to it. is not related to the specific code (UTF-8,UTF-16,GB2312).
My previous mistake was that each set of encodings had its own corresponding table.
It now appears that the Unicode standard encompasses all the characters in the world, each with a corresponding Unicode value.
My so-called coding, such as UTF8, UTF16, gb2312, etc., is only serialized into binary mode when it is saved to disk or transmitted over the network, and it has nothing to do with the Unicode value of the character.
Utf-32 saves a character with 4 bytes, although it is a waste of space, but from binary parsing (decode) into characters, the speed is certainly faster than UTF8, because every four bytes represents a character, and the character that finds a specific position in a stream must be fast, O (1).
Utf-8 saves space, but finding specific location characters is difficult and time efficiency is O (n)
The understanding of Unicode encoding from Ord ()