【
About GB18030 encoding】
GB 18030 wiki:https://zh.wikipedia.org/wiki/gb_18030
A single byte with a value from 0 to 0x7f.
DWORD, the value of the first byte is from 0x81 to 0xFE, and the second byte has a value from 0x40 to 0xFE (not including 0x7f).
Four bytes, the value of the first byte from 0x81 to 0xFE, the value of the second byte from 0x30 to 0x39, the third byte from 0x81 to 0xFE, and the fourth byte from 0x30 to 0x39.
" a way to handle decoding errors "
Unicodedecodeerror: ' GB18030 ' codec can ' t decode byte 0xff in position 129535:illegal multibyte sequence
Import codecs# GB18030 garbled handlerdef walkergb18030replacehandler (exc):p rint (' exc.start:%d '% exc.start) print (' Exc.end :%d '% exc.end) print (' exc.encoding:%s '% exc.encoding) print (' Exc.reason:%s '% exc.reason) text = ' for ch in exc.object[ Exc.start:exc.end]:p rint (' ch: ') print (ch) text + = (' 0x%02X '% ch) return (text, exc.end) # Register Custom Handlercodecs.register_ Error ("Myreplace", Walkergb18030replacehandler)
" Related reading "
About the Python3 code
Walker's Journal * * *
Python3 Processing GB18030 garbled