Deepen your understanding of Python character encoding with the following exercises
#\x00-\xff 256 character (S )>>>A = range (256)
>>>b = Bytes (a)#No parameter encoding>>>bB ' \x00\x01\x02 ... \xf6\xf7\xf8\xf9\xfa\xfb\xfc\xfd\xfe\xff '>>>B.decode ('Utf-8')#Error Traceback (most recent): File "<stdin>", line 1, in <module>unicodedecodeerror: ' Utf-8 ' codec can ' t decode byte 0x80 in position 128:invalid start byt E>>>B.decode ('Unicode-escape')#Normal' \x00\x01\x02 ... \xf6÷\xf8ùú\xfbü\xfd\xfe\xff '
# out of the question: the above sentence is equivalent to the following sentence
>>> ". Join (List (map (CHR, range (256)))
' \x00\x01\x02 ... \xf6÷\xf8ùú\xfbü\xfd\xfe\xff '
>>>A ='ABC'>>>a' abc '>>>b = Bytes (A, encoding='Utf-8')#Way One: Turn ' abc ' into byte data>>>bB ' abc '>>>c = A.encode ('Utf-8')#mode two: Turn ' abc ' into byte data, with an equivalent>>>cB ' abc '#\x00-\xff 256 characters, ByteArray way>>>A = range (256)>>>b =ByteArray (a)>>>bByteArray (b ' \x00\x01\x02 ... \xf6\xf7\xf8\xf9\xfa\xfb\xfc\xfd\xfe\xff ')>>>B.decode ('Unicode-escape')' \x00\x01\x02 ... \xf6÷\xf8ùú\xfbü\xfd\xfe\xff '#Chinese Code>>>A ='in'>>>a' Medium '>>>b = A.encode ('GBK')>>>bB ' \xd6\xd0 '>>>c = A.encode ('Utf-8')>>>cB ' \xe4\xb8\xad '>>>D = A.encode ('Unicode-escape')>>>dB ' \\u4e2d '>>>E = A.encode ('cp936')>>>eb ' \xd6\xd0 '#Chinese decoding>>>A.decode ('Utf-8')Traceback (most recent): File "<stdin>", line 1, in <module>attributeerror: ' str ' object have no attribute ' decode '>>>B.decode ()Traceback (most recent): File "<stdin>", line 1, in <module>unicodedecodeerror: ' Utf-8 ' codec can ' t decode byte 0xd6 in position 0:inva Lid continuation byte>>>B.decode ('Utf-8')Traceback (most recent): File "<stdin>", line 1, in <module>unicodedecodeerror: ' Utf-8 ' codec can ' t decode byte 0xd6 in position 0:INV Alid Continuation byte>>>B.decode ('GBK')' Medium '>>>B.decode ('cp936')#GBK encoding can be cp936 decoding, and vice versa. Because GBK is a subset of cp936' Medium '
Python Character coding Exercises